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Solamente el estupor conoce. 
San Gregorio de Nisa 



Mucho razonamicnto y poca observacion Uevan al error, 
Mucha observacion y poco razonamiento Uevan a la verdad. 

Alexis Carrel 



Uno no puede evitar asombrarse cuando contempla los misterios de la eternidad, 

(Ic la vida, do la maravillosa cstructura do la rcalidad. 
Es suficiente si uno trata simplemente de comprender un poco de este misterio cada dia. 

No hay que perder jamas la sagrada curiosidad. 

Albert Einstein 



The neural network approach to parton distributions 



The neural network approach to parton distributions 



7 



Agradecimentos 



La realizacion de una tesis doctoral, mas que una obra individual, es por encima de todo una 
tarea comunitaria. Por csto los agradccimicntos son cl momcnto cn cl cual sc ticncn cn cucnta la 
contribucion de todas aquellas personas que con sus aportaciones de todo tipo ban permitido que la 
presente tesis doctoral haya llegado a buen puerto. 

En primer higar quisicra agradcccr a Jose Ignacio Latorrc por acompafiarnic cstos cuatro afios 
como director de esta tesis doctoral. Desde sus primeras clases de Fisica de Altas Energias, me 
cautivo la pasion con la que explicaba, que es la misma con la que vive la aventura de la investigacion 
cicntifica. Jose Ignacio ha sido para mi no solo un maestro en el campo de la ciencia sino tambien 
un modelo por su honestidad en todos los ambitos. 

Tambien aprovecho para agradecer al codirector de esta tesis doctoral, Stefano Forte. Con 
Stcfano he aprcndido cl rigor ncccsario para scr un buen cicntifico. asi como la pasion ncccsaria para 
afrontar incluso los problemas mas dificiles sin desfallecer. Su ayuda continua han sido esenciales 
para llevar a cabo el trabajo de investigacion desarrollado en esta tesis doctoral. 

Muchas han sido las personas con las cualcs durante cstos ciiatro afios he compartido el placer 
de la investigacion cientifica, aprendiendo continuamente de todos ellos. De manera especial a los 
otros miembros de la NNPDF Collaboration, Andrea Piccione y Luigi Del Debbio, pero tambien a 
muchas otras personas: Antonio Pineda, Jorge Russo, Giovanni Ridolfi, Concha Gonzalez-Garcia, 
Michele Maltoni... Asimismo, querria agradecer a todas aquellas personas con las que he compartido 
siempre instructivas discusiones sobre fisica, y con las cuales he aprendido poco a poco el arte de ser 
investigador. La lista es muy larga: German Rodrigo, Joan Soto, Ignazio Scimemi, Thomas Becher, 
Einan Gardi ... 

Tambien querria agradecer a todos aquellos companeros del departamento de Estructura i Con- 

titucnts de la Materia con los que he pasado estos afios, compartiendo la aventura de realizar una 
tesis doctoral: Xavi, Enrique, Roman, Diego, Dani, Miriam, Luca, Jaume, Alex ... Pero especial- 
mente a Manel y Arnau, pues desde que nos conocimos en el primer curso de la licenciatura, nos 
hcmos acompahado a lo largo de toda la carrera y el doctorado. Parece que aquel dia tan lejano 
que apenas dislumbrabamos cuando haciamos Fonaments de Fisica ahora esta cerca para los tres. 
Tambien estoy agradecido a Joan Soto y Nuria Barberan por comunicarme tanto su pasion como 
su rigor en la docencia de las asignaturas que yo he colaborado a impartir en los dos liltimos ahos. 
Finalmente, agradecer a Oriol y a Rosa su continua ayuda en tantos detalles practicos que han 
surgido en estos afios. 

Son tantos los amigos que a lo largo de estos cuatro afios me han acompafiado en la apasionante 
aventura de la vida, haciendo posible renovar continuamente la pasion por todo, incluyendo la 
investigacion cientifica, que pido perdon por adelantado por todos aquellos que no tengo presentes. 
A Juan Ramon y a Roger especialmente por su inasequible apoyo y su continua ayuda en el cuidar 
la no siempre facil vocacion cientifica. A Lluis por comunicarme su pasion por el conocimiento y 
por la vida, y por proponerme una amistad que dura hasta hoy. A todos aquellos amigos que hemos 
vivido juntos estos ahos: Nestor, Raquel, Jorge, Xavi, Alfonso, Anna, Miquel C, Cristina. Miquel, 
Josep C, Josep M., Marc, Albert gracias una vez mas por vuestra infatigable compahia en este 
camino. 



The neural network approach to parton distributions 



8 



Tambicn cstoy profundamente agradecido a los amigos de fisica de Milan: Giuliano, Tommy, 
Betta, Paola, Maria, y a toda la asociacion cultural Euresis, por ayudarme a vivir mi vocacion 
cientifica con un horizonte abierto a toda la realidad. Tambien aprovecho para agradecer a los 
amigos de Madrid de la Asociacion Universitas, por su empefio continuo en vivir la vida academica 
siempre consciente de las razones de la propia vocacion, y por tanto posibilitando renovar siempre 
el interes tanto por la docencia como por la investigacion. 

Sin embargo, por encima de todo queria agradecer a mi familia todo el apoyo y la ayuda incansable 
que me han proporcionado a lo largo todos estos anos. Mi padre Eduardo ha sido desde siempre 
mi referenda, tanto a nivel academico como a nivel personal. La pasion y la alegria con la que mi 
padre vive su trabajo en la Universidad han sido mi mayor motivacion para empezar fisica primcro, 
y decidirme por la carrera academica despues. Mi madre Carmen tambien me ha ayudado siempre a 
valorar el estudio y el trabajo, educandome en el interes por toda la realidad, algo que nunca podre 
agradecer suficicntcmcnte. Mi licrmano Ignacio ha sido siempre modclo para mi, por la pasion y 
seriedad con la que ha vivido siempre el estudio, y ahora el trabajo. Tambien estoy muy agradecido 
a mi familia politica, Raiil, Margarita, Olga y Montserrat, por su continuo apoyo a todos los niveles 
en la realizacion dc csta tcsis doctoral. 

Finalmente, todo mi agradecimicnto va dirigido a mi mujer, Sonia. Gracias a ella, a su ayuda 
y su estimulo, he podido realizar la prcsente tesis doctoral. Su apoyo infatigable en todas las 
circunstancias ha sido lo que mc ha pcrmitido empezar cada dia c;on una ilusion plcnamente renovada, 
tanto en la investigacion como en la docencia. Por todo ello, no puedo mas que agradecerle otra 
vez todos estos anos en que nos hemos acompanado en la apasionante aventura de la vida y el 
matrimonio. 



The neural network approach to parton distributions 



The neural network approach to parton distributions 



The neural network approach to parton distributions 



11 



List of publications 
Research articles - published 

• J. Rojo and J. I. Latorre 1 , "Neural network parametrization of spectral functions 
from hadronic tau decays and determination of QCD vacuum condensates," JHEP 
0401, 055 (2004) arXiv:hep-ph/0401047 . 

• J. Brugues, J. Rojo and J. G. Russo [2], "Non-perturbative states in type II super- 
string theory from classical spinning membranes," Nucl. Phys. B 710, 117 (2005), 
(aJXiv:hep-th/0408174 . 

• L. Del Debbio, S. Forte, J. I. Latorre, A. Piccione and J. Rojo "Unbiased deter- 
mination of the proton structure function F2 with faithful uncertainty estimation" , 
JHEP 0503 (2005) 080, arXiv:hep-ph/0501067| . 

• S. Forte, G. Ridolfi, J. Rojo and M. Ubiali "Borel resummation of soft gluon 
radiation and higher twists," arXiv:hep-ph /060104 8 Phys. Lett. B 635 (2006) 313. 

• J. Rojo 0, "Neural network parametrization of the lepton energy spectrum in B meson 
decays", JHEP 0605 (2006) 040 arXiv:hep-ph/0601229 . 

• J. Mondejar, A. Pineda and J. Rojo [HI, "Heavy meson semileptonic differential decay 
rate in two dimensions in the large A^ci" |arXiv:hep-ph/0605248 

Research articles - in preparation 

• L. Del Debbio, S. Forte, J. I. Latorre, A. Piccione and J. Rojo [NNPDF Collaboration], 
"The neural network approach to parton distributions: The nonsinglet case", UB- 
ECM-PF 06/17. 

• C. Garcia-Gonzalez, M. Maltoni and J. Rojo, "Neural network parametrization of the 
atmospheric neutrino flux" , in preparation. 

• L. Del Debbio, S. Forte, J. I. Latorre, A. Piccione and J. Rojo [NNPDF Collabora- 
tion], "The neural network approach to parton distributions: The singlet case", in 
preparation. 

Conference proceedings 

• J. Rojo, "A probability measure in the space of spectral functions and structure func- 
tions" [7], proceedings of the QCD International Conference, Montpellier 2004, Nucl. 
Phys. B (Proc. Suppl.) 152 (2006) 57, arXiv:hep-ph/0 407147| 

• J. Rojo, L. Del Debbio, S. Forte, J. I. Latorre and A. Piccione [NNPDF Collaboration] 
[HI, "The neural network approach to parton fitting," proceedings of DIS05 workshop, 
|arXiv:hep^ h/0505044 

• J. Rojo, L. Del Debbio, S. Forte, J. I. Latorre and A. Piccione [NNPDF Collabora- 
tion] [5], "The neural network approach to parton distributions," contribution to the 
proceedings of the Hera-LHC workshop, hep-ph/0509059 



• J. Rojo, L. Del Debbio, S. Forte, J. I. Latorre and A. Piccione [NNPDF Collaboration] 
[TU|. "The neural network approach to parton fitting," proceedings of the ACAT05 
workshop, |hep-ph/0509067| 



The neural network approach to parton distributions 



Contents 



ll Introduccidnl 



12 ResumenI 

13 Introduction and motivationi 

14 Elem ents of perturbative QCD a nd global parton distributions analysis I 

k.l Overview of perturbativc QCDI 



E 



I4T 



4.1.1 Basics of Quantum Chro modvnamicsl . 



4.1.3 



4.1.4 



B meson scmilcptonic dccavsl 



Global fits of parton distribution functionsi . 
4.2.1 Th£_standar^^^^roach| 



4.2.2 Uncertainties in parton distributions! 



The neural network approach: General strategvl 
I5.I M pntc (^arlo sampling' of experi mental datal . . . 



I5T 



I5T 



^.1.2 ^^ati^ti^ al estimators: generated replicas vs. experimental datal 
^^^^^^^^^^^ 



5.2.1 



5.2.2 



Neural networks as unbiased interpolant^ 
^^^mm^^^^tegie^ 



5.2.3 Minimization algorithma 



Implementation of theoretical constraints! 
ValidationoFTheTesultX 



5.2.4 



5.3.1 Statistical estimators: probabilitv measure vs. experimental datal 



^^^^^^^gric^^s^m^tora^^^^^^^^^^^rob^n^^meas^^ 



l6 The neural network approach: Applicat ions! 



^^^^EC^^^^^crio^^^^^^^ni^^^^ec^3" 



6.2 



Structure functions in deep inelastic scatterine 
6.3 Lcpton energv spectrum in B meson decavj . 
^^~P^^^^^^^^i^o^^mctions! 



6.4.1 A new approach to parton evolution! 



^^^^^^^To^^^rie^^^^^istoA^^T 



\7 Conclusions and outlook! 
!8 Conclusione^ 



13 



The neural network approach to parton distributions 14 

IA Elem ents of statistical da ta analysis! 141 
A'^^cvj^^^FprobabUit^Theor yl 141 



A. 2 The Monte Carlo approach to error estimatiorj 142 



A. 3 Correct treatment of normalization unccrtaintiesl 144 



IB Overview of global parton fitd 149 



Chapter 1 

Introduccion 



El Large Hadron Collider (LHC, gran colisionador de hadrones) es un colisionador de protones a 
las energias mas elevadas conseguidas artificialmente por el hombre, situado en el Centre Europeo 
para la Investigacion Nuclear, el CERN en Ginebra (Suiza). Este acelerador de particulas esta 
actualmente en construccion, y se espera que las primeras colisiones de protones se produzcan antes 
del final del afio 2007. Las enormes energias que se alcanzaran en las colisiones entre protones en 
el LHC nos permitiran examinar las leyes fundamentales de la naturaleza a las menores distancias 
jamas investigadas. En particular, se estudiara en detalle el mecanismo de la ruptura de simetria 
electrodebil, mediada por la particula de Higgs, que es la responsable de dar las masas a todas las 
particulas elementales conocidas. Ademas, se espera que LHC nos proporcione informacion detallada 
sobre nueva fisica mas alia del Modelo Estandard de Fisica de Particulas, que ha sido construido y 
comprobado con enorme exito durante los ultimos 25 afios. En la fig. ll.ll Dodemos ver la localizacion 
del experimento LHC cerca de Ginebra, asi como uno de sus detectores, ATLAS, donde se examinan 
los resultados de la colisiones entre protones. 




Figure 1.1: 

La localizacion del tiinel de 27 kilometros donde esta situado el LHC, cerca de Ginebra (los Alpes pueden 
ser vislumbrados detras) (izquierda) y uno de sus detectores, ATLAS, que es tan grande como un edificio 

de siete pisos (derecha). 

Sin embargo, la extraccion de la nueva fisica de las colisiones proton-proton a altas energias 
que se produciran en el LHC es una tarea extremadamente complicada, por una serie de motivos 
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que detallaremos a continuacion. El principal de estos motivos es que esta nueva fisica estara 
escondida entre una multitud de procesos de fisica conocida, debido a la interaccion fuerte entre 
quarks y gluones (que son las particulas elementales que componen los protones) determinadas por 
el Modelo Estandard, en particular por la teoria conocida como Cromodinamica Cuantica (Quantum 
Chromodynamics, QCD). Estos procesos seran mucho mas frecuentes que las colisiones en donde 
se produzcan los efectos buscados de nueva fisica. Por lo tanto, el potencial de descubrimiento del 
LHC, asi como su habilidad para realizar medidas de precision de las propiedades de esta nueva 
fisica, dependen de una manera crucial de nuestra comprension quantitativa de los procesos de la 
interaccion fuerte y las incertidumbres que estos Uevan asociados. En la fig. 11.21 podemos ver el 
resultado de un proceso caracteristico de colision entre protones en el LHC. Es necesario recalcar 
que de los miles de millones de colisiones como esta que se produciran cada ano en el LHC, sera 
necesario extraer aquellas pocas que contienen informacion autenticamente relevante. 




Figure 1.2: 

Resultado de una colision tipica entre protones en el LHC: podemos observar las trazas que dejan los 
cientos de particulas producidas en cada colision, asi como la energia total que cada una de estas particulas 

Ueva asociada. 

Entre las diversas fuentes de incertidumbre asociadas a los procesos que involucran particulas con 
interaccion fuerte, una de las de mayor importancia es debida a las distribuciones de partones (Parton 
Distribution Functions, PDFs) en el interior del proton. Estas distribuciones son una medida de la 
cantidad de la energia total del proton que Ueva cada uno de sus diversos componentes, los quarks 
de diferente sabor y los gluones. Estas distribuciones de partones no pueden ser calculadas en teoria 
de perturbaciones, sino que necesitan ser extraidas de los datos experimentales que provienen de una 
gran cantidad de procesos de fisica de altas energias, como por ejemplo las colisiones profundamente 
inelasticas (Deep inelastic scattering, DIS) entre leptones (particulas elementales sin interaccion 
fuerte) y protones. 

Las distribuciones de partones las denotaremos por 

(Z,(x,Q2), i = l,...,27V/ + l , (1.1) 
donde es la energia tipica del proceso de colision, la variable x denota la fraccion de la energia 
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total del proton que Ueva el parton i, y tenemos una distribucion de partones independiente para 
cada quark y cada antiquark de diferente sabor, mas una para los gluones. Hay que tener en 
cuenta que en QCD las energias son grandes o pequenas en comparacion con la masa del proton 
Mp, que constituye la escala caracteristica de la teoria. Estas distribuciones de partones pueden 
ser determinadas gracias a la existencia del teorema conocido como el Teorema de la Factorizacion 
en Cromodinamica Cuantica. Segun este teorema, cualquier seccion eficaz de colision (que mide la 
probabilidad de que dos particulas colisionen y interaccionen) en procesos que involucren la interacion 
fuerte puede ser separada en el producto de dos terminos: un coeficiente que depende del proceso 
en cuestion, que podemos calcular en teoria de perturbaciones, y un conjunto de distribuciones 
de partones que son universales, es decir, que son independientes de los detalles particulares del 
proceso. En la fig. 11.31 observamos un esquema del interior de un proton, con los diferentes quarks 
interaccionando entre si mediante los gluones, y donde la flecha indica el spin, el momento angular 
interno de cada particula. El nioviniiento de estos quarks y gluones en el proton viene dictado por 
estas distribuciones de partones. 




Figure 1.3: 

El interior de un proton se compone de quarks y antiquarks de diferentes sabores, junto con gluones que 
mediante la interaccion fuerte mantienen a estos quarks confinados en el interior del proton. 

Por ejemplo, las funciones de estructura en colisiones profundamente inelasticas pueden escribirse 
como 

F{x, Q2) ^ c\ (x, as (Q')) ® Q^) , (1.2) 

es decir, como la convolucion entre un coeficiente Ci{x) que depende del proceso y las distribuciones 
de partones qi{x), que son identicas para todo proceso de colision a altas energias que involucre 
particulas con la interaccion fuerte en el estado inicial. Solamente es necesario determinar las 
distribuciones de partones a una escala inicial Qq relativamente pequefia, Qq ^ M^, pues su de- 
pendencia con la energia (denoniinada evolucion) viene dictada por la teoria de perturbaciones de 
la Cromodinamica Cuantica. Es necesario tener en cuenta que en general diferentes combinaciones 
de distribuciones de partones contribuyen a cada observable de manera diferente. Aunque la mayor 
fuente de informacion experimental sobre las distribuciones de partones proviene de las medidas de 
alta precision en los procesos de colision profundamente inelasticos descritos anteriormente, otros 
procesos son esenciales como produccion de jets (conjuntos de hadrones, es decir, de particulas que 
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interaccionan fuertemente) en colisiones de protones o el proceso de Drell-Yan, esto es, la produccion 
de parejas de leptones tambien en colisiones entre hadrones. 

El requisito de poder realizar fisica de precision en colisionadores de protones, como el futuro 
LHC, implica que es necesario determinar con la mayor exactitud posible no solamente las diferentes 
distribuciones de partones en el proton, gi(x, Qq), sino tambien las incertidumbres que estas tienen 
asociadas. Estas incertidumbres provienen del hecho de que puesto que las distribuciones de partones 
se extraen de datos experimentales, que tienen una precision finita, tambien estas tendran a su vez 
una precision finita, es decir, un error experimental asociado. Para ver la importancia capital que las 
distribuciones de partones tendran en el LHC, es necesario notar que una seccion eficaz de colision 
tipica tiene la forma 

cr(x,Q2) = {x,as (Q^)) (^q^{x,Q^)(g)qJ{x,Q'^) , (1.3) 

es decir, que depende de dos distribuciones de partones, una para cada de los partones que involucran 
las colisiones proton-proton. Este problema es particularmente grave pues la region cinematica que 
cubrira el LHC ha sido solamente parcialmente cubierta por aquellos experimentos que han sido 
usados para determinar las distribuciones de partones con anterioridad, como HERA y SLAC (ver fig. 
I1.4fl . y por lo tanto sera necesario extrapolar las distribuciones de partones a regiones donde nunca 
han sido medidas. Es importante, por lo tanto, controlar de manera muy precisa las incertidumbres 
asociadas a este proceso de extrapolacion. 




Figure 1.4: 

Esquema del experimento de colisiones profundamente inelasticas HERA en Hamburgo. 

El problema principal, en vista de todo lo descrito anteriormente, es determinar los errores de una 
funcion continua como son las distribuciones de partones, es decir, una densidad de probabilidad en 
el espacio de estas funciones, a partir de un conjunto finito de datos experimentales. Existe toda una 
serie de tecnicas estandard para determinar las distribuciones de partones a partir de las medidas 
experimentales, y diversas estrategias han sido usadas para estimar de maneras diversas ademas 
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los errores asociados a estas distribuciones. Todas estas estrategias han sido utiles para estimar 
de manera aproximada el tamano de estas incertidumbres, pero sin embargo sufren de un cierto 
mimero de problemas. Con las tecnicas estandard, las distribuciones de partones se parametrizan 
con funciones relativamente simples, tipicamente polinomios de la forma 

q,{x,Ql)^Ax'''{l~xy'{l + d,x + e,x^), i = 1, . . . , Nf + 1 , (1.4) 

donde los parametros Ai,bi . . . se determinan a partir de un fit de los dates experimentales. El primer 
problema aparece de immediato: nos estamos restringiendo al espacio de distribuciones de partones 
parametrizadas por la ec. 11.41 lo cual esta claramente injustificado pues no hay en la Cromodinamica 
Cuantica, la teoria que en principio determina la forma de estas distribuciones, nada que implique 
que las distribuciones de partones han de tener la forma funcional tan especifica que podemos 
observar en la ec. 11.41 Segundo, si se quieren propagar los errores que la ec. 11.41 lleva asociados a 
otros observables, como secciones eficaces de la forma de la ec 11.31 son necesarias aproximaciones 
de linealizacion, que como es bien conocido no son validas en un amplio rango de situaciones. 
Finalmente, el tercer inconveniente que presenta la tecnica estandard es que en presencia de datos 
experimentales que provienen de experimentos diferentes, la presencia de incompatibilidades entre 
los datos (por ejemplo, que la funcion do estructura, ec. ll.2l medida en dos experimentos distintos sea 
muy diferente) implica que la condicion para determinar los errores a partir de la funcion de error 
no es Ax^ — 1, como indica la estadistica basica, sino Ax^ — 100. En la practica esta eleccion implica 
que los errores de las distribuciones de partones no tienen ningun significado estadistico riguroso pues 
dependen de un parametro arbitrario A^^. En la fig. ll.Sl tenemos un ejemplo de una determinacion 
reciente de distribuciones de partones junto con las incertidumbres asociadas. Notemos que hay dos 
tipos de errores asociados: los errores estadisticos habituales (debido a disponer de un mimero finito 
de medidas) y los errores sistematicos, que dependen en general del proceso de medida y que estan 
correlacionados entre diferentes datos experimentales. 




Id' w- 10-' 1 



Figure 1.5: 

Un ejemplo de una determinacion reciente de distribuciones de partones. Notemos que cada distribucion 
tiene asociada una banda de incertidumbre, que es una medida de los errores de cada una de las 

distribuciones de partones. 
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Por todas las razones descritas hasta ahora, y en vista de la importancia crucial que una esti- 
macion fidedigna de las incertidumbres asociadas a las distribuciones de partones tiene para la fisica 
de precision en el LHC, es claramente deseable investigar estrategias alternativas para la determi- 
nacion de estas distribuciones que permitan superar y mejorar los problemas de la tecnica estandard 
descrita anteriormente. En jllj una novedosa tecnica fue presentada consistente en la determinacion 
de la densidad de probabilidad en el espacio de las funciones, que fue aplicada a la parametrizacion 
de funciones de estructura, como la ec. 11.21 en procesos de colision profundamente inelasticos entre 
leptones y protones. Esta novedosa tecnica usaba una combinacion de metodos Monte Carlo para la 
construccion de samplings de los datos experiment ales junto con redes neuronales artificiales como 
interpolantes universales. El uso de redes neuronales artificiales en lugar de funciones polinomicas 
como la eq. 11.41 en la parametrizacion de las distribuciones de partones permite eliminar la depen- 
dencia de los resultados en la forma funcional escogida arbitrariamente. Por su parte, la tecnica de 
los samplings Monte Carlo permite una estimacion estadisticamente rigurosa de las incertidumbres 
asociadas a la funcion que estamos parametrizando, y ademas la propagacion de estos errores a otros 
observables se puede realizar con toda generalidad, sin necesidad de aproximaciones de linearizacion. 

En la presente tesis doctoral hemos extendido los resultados presentados en ^T], aplicados a las 
funciones de estructura en colisiones profundamente inelasticas, en diversas direcciones. En primer 
lugar hemos aplicado esta tecnica general para parametrizar datos experimentales a otros procesos 
de interes en fisica de altas energias. Los procesos que han sido estudiados son las desintegraciones 
hadronicas del lepton tau, las desintegraciones semileptonicas del meson B, y las colisiones pro- 
fundamente inelasticas desde dos puntos de vista: a nivel de funciones de estructura, incluyendo 
todos los datos experimentales a nuestra disposicion, y a nivel de distribuciones de partones non- 
singlet, es decir, cuando el gluon se ha desacoplado del resto de distribuciones. En cada una de estas 
aplicaciones, la relacion entre los datos experimentales y la funcion a parametrizar era diferente, 
demostrando que la tecnica descrita en esta tesis es completamente general, valida para un gran 
niimero de situaciones. 

Ademas, en la presente tesis hemos extendido la estrategia original descrita en mediante 
la introduccion de nuevos algoritmos para entrenar las redes neuronales artificiales, en particular 
los conocidos como Algoritmos Geneticos. Estos algoritmos son necesarios para entrenar redes 
neuronales mediante funciones de error altamente no lincales, como nos sucede en las diferentes 
aplicaciones que describiremos en esta tesis. Finalmente, se han mejorado las tecnicas estadisticas 
utilizadas para la validacion de los resultados obtenidos, esto es, de la densidad de probabilidad con- 
truida en los diferentes casos. En el siguientc Capitulo describimos con cierto detalle los contenidos 
de la presente tesis doctoral y los resultados que han sido obtenidos. 



Chapter 2 



Resumen 



La presents tesis doctoral esta organizada de la manera que se describe con cierto detalle a con- 
tinuacion. En el Capitulo 4 hacemos un resumen de los elementos basicos de la Cromodinamica 
Cuantica, que como hemos explicado con anterioridad es el sector del Modelo Estandard de fisica 
de particulas que gobierna la Uamada interaccion fuerte entre particulas elementales. Asimismo, de- 
scribimos tambien aquellos procesos de fisica de altas energias en los cuales utilizaremos la estrategia 
general para parametrizar datos experimental que constituye el principal objeto de esta tesis. Junto 
con este resumen, presentamos tambien una descripcion detallada de la tecnica estandard, discutida 
en el Capitulo anterior, que se usa comunmente para extraer las distribuciones de partones con sus 
errores asociados a partir de un conjunto de datos experiment ales. 

A continuacion, en el Capitulo [S] describimos con todo lujo de detalles la tecnica general para 
construir la densidad de probabilidad de una funcion a partir de medidas experimentales, esto es, 
una tecnica para parametrizar datos experimentales, sin necesidad de hacer hipotesis alguna sobre la 
forma funcional de la funcion a parametrizar y con una estimacion fidedigna de las incertidumbres 
asociadas, que permite una propagacion de los errores a observables arbitrarios sin necesidad de 
aproximaciones lineales. Este Capitulo esta divido en tres partes, metodos Monte Carlo, redes 
neuronales artificiales y estimadores estadisticos, que procedemos a describir a continuacion. 

En la primera parte de este Capitulo se describen los metodos Monte Carlo que usamos para 
construir un sampling de los datos experimentales que contenga toda la informacion que nos pro- 
porcionan los experimentos, incluyendo los errores y las correlaciones. Asimismo, introducimos un 
conjunto de estimadores estadisticos que nos permiten estimar quantitativamente como este sampling 
Monte Carlo reproduce las caracteristicas de las medidas experimentales. Esta tecnica nos permite 
estimar de una manera fidedigna los errores asociados a la funcion a parametrizar, y demostraremos 
que es equivalente a los errores definidos a partir de una funcion de error cuando las aproximacion 
lineales en la propagacion de los errores son suficientes. 

En la segunda parte del CapituloOintroducimos las redes neuronales artificiales, que utilizaremos 
como interpoladores universales, asi como los diferentes metodos que usaremos para entrenar estas 
redes neuronales, esto es, para que aprendan los patterns presentes en los datos experimentales. Las 
redes neuronales artificiales son una herramienta habitual en diversos campos cientificos, desde la 
biologia a la computacion, y en particular se usan con asiduidad en fisica experimental de altas 
energias, en aplicaciones como clasificacion de eventos en funcion de sus propiedades. En la fig. 12.11 
mostramos una red neuronal artificial de la clase que utilizaremos en esta tesis doctoral, conocida 
como multi-layer feed-forward perceptron. Asimismo describiremos como nuestra tecnica permite la 
incorporacion de informacion teorica de maneras muy diversas, como reglas de suma o condiciones 
implicadas por la cinematica del proceso. 

Las redes neuronales artificiales tienen la interesante propiedad de que es es posible demostrar 
que cualquier funcion continua, indcpcndicntcmcntc do lo complicada que sea y del mimero de 
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Figure 2.1: 

Diagrama esquematico de una red neuronal artificial multicapa del tipo feed-forward. 



parametros que tenga, puede ser representada en terminos de una multi-layer feed-forward red neu- 
ronal artificial. Una segunda propiedad importante de las redes neuronales artificiales es que estas 
son muy eficientes en combinar de una manera optima la informacion experimental que proviene de 
diferentes medidas de una misma cantidad. Esto es, cuando la separacion entre datos experimentales 
en el espacio de variables de entrada es mas pequefia que una determinada longitud de correlacion, 
entonces la red neuronal combina de manera eficiente esta informacion experimental, de manera que 
el correspondiente pattern de salida es mas preciso que las medidas experimentales individuales. 

La utilidad de las redes neuronales artificiales es debida a la existencia de diversos algoritmos de 
entrenamiento. Este proceso se llama aprendizaje, pues no requiere un conocimiento a priori de la 
forma funcional que describe los datos experimentales. En particular, en la presente tesis doctoral 
hemos introducido los Uamados algoritmos geneticos para el entrenamiento de redes neuronales 
artificiales. Estos algoritmos geneticos tienen un gran mimero de ventajas respecto a los metodos 
de minimizacion deterministas que los hacen preferibles para problemas, como los que nos ocupan 
en la presente tesis doctoral, altamente no lineales y con un enorme espacio de parametros. 

Las ventajas de los algoritmos geneticos, que como su propio nombre indica estan inspirados en 
los mecanismos que se observan en la naturaleza sobre la evolucion y la seleccion natural, se pueden 
resumir en tres. Primero de todo, estos algoritmos trabajan simultaneamente en poblaciones de 
soluciones, lo que les permitc cxplorar regiones diferentes del espacio de parametros al mismo tiempo. 
Segundo, no necesitan ninguna informacion extra de la funcion a minimizar, como el gradiente de 
esta. Finalmente, estos algoritmos tienen una mezcla de elementos estocasticos aplicados bajo reglas 
deterministas, que mejora su eficiencia en problemas con muchos extremes locales, pero sin la perdida 
de efectividad que una busqueda meramente aleatoria implicaria. 

Finalmente, en la tercera parte del Capitulo, analizaremos aquellas herramientas estadisticas 
que nos permiten validar el resultado obtenido, esto es, determinar de una manera quantitativa 
como la densidad de probabilidad que hemos construido reproduce las carateristicas de los datos 
experimentales, asi como su dependencia con respecto diversos parametros, como por ejemplo el 
mimero de redes neuronales empleadas en la parametrizacion. Estas tecnicas estadisticas permiten 
tambien detereminar a partir de criterios solidos caracteristicas de la densidad de probabilidad como 
la longitud del entrenamiento de las redes neuronales o el valor optimo de la funcion de error x^- 

El conjunto de redes neuronales artificiales entrenadas en el conjunto de replicas Monte Carlo 
de los datos experimentales para una funcion F definen la densidad de probabilidad en el espacio 
de funciones F que estabamos buscando. Con esta densidad de probabilidad, podemos obtener los 
valores esperados para funcionales arbitrarios de la funcion F, T \F\^ a partir del conjunto de redes 
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neuronales de la manera siguiente, 

(J'iF]) = VFV [F] T[F] = jr[^(not)(fc)] ^ (2.I) 

J ^"P fc=i 

como con las distribuciones de probabilidad habituales. De esta manera podemos determinar la 
media de F, su variancia y su correlacion, utilizando las definiciones habituales de estos estimadores 
estadisticos. 

El Capitulo IHl describe con detalle cuatro aplicaciones diferentes de la tecnica introducida en el 
Capitulo|31 En primer lugar analizamos la parametrizacion de la funcion espectral pv-A{s)^ esto 
es, la diferencia entre las funciones espectrales correspondientes a los canales vectoriales y axiales, 
en las desintegraciones hadronicas del lepton r, que ban sido medidas con gran precision en el 
experimento LEP (Large Electron Positron collider, gran colisionador de electrones y positrones), 
el antecesor de LHC en el CERN. Como producto de este analisis determinamos los condensados 
de vacio de QCD, {Ok), que son parametros no perturbativos que deben extraerse de los datos 
experimentales. La determinacion de estos condensados a partir de los datos experimentales nos 
proporciona informacion sobre aspectos fundamentales de la Cromodinamica Quantica, como el 
mecanismo dc la ruptura de la simetria quiral, y ha sido objeto de intenso estudio en los liltimos 
anos. En la fig. 12.21 representamos los valores obtenidos para los condensados de vacio de QCD, 
(Ofc), como se describe en la seccion correspondiente de la tesis doctoral. 
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Figure 2.2: 

Los resultados para los condensados de vacio de la Cromodinamica Cuantica obtenidos en la presente tesis 
doctoral, como funcion del parametro de integracion sq. Las bandas de error corresponden a 
incertidumbres de l-cr que provienen de la parametrizacion de pv~a{s). 

En la segunda de las aplicaciones descritas en el CapituloEl generalizamos los resultados de 
referidos a la parametrizacion de funciones de estructura F{x,Q^), construyendo una parametrizacion 
de estas funciones en colisiones profundamente inelasticas del proton que incluye todos los datos ex- 
perimentales de que se dispone, en particular las medidas de alta precision del experimento HERA, 
especialmente en la region de pequefio x. En la fig. 12.31 podemos ver el resultado de este analisis 
comparado con los resultados originales obtenidos en ^T]. Notemos como en la region cinematica 
donde los datos experimentales incluidos en los dos analisis se solapan, que es la que corresponde 
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a valores grandes de la variable los dos resultados son perfectamente consistentes, como era de 
esperar. 
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Figure 2.3: 

Los resultados para la funcion de estructura del proton ^^(x, Q^) obtenidos en la presente tesis doctoral 

comparados con los resultados originales 

En tercer lugar, estudiamos las desintegraciones semileptonicas del meson B, comparamos nue- 
stros resultados con una serie de resultados teoricos y extraemos de la parametrizacion del espectro 
energetico del lepton la masa del quark b, mi,. La determinacion de parametros no perturbativos, 
como las masas de los quarks pesados o los elementos de la matriz de CKM, son una de las princi- 
pals motivaciones para los analisis teoricos y experimentales de las desintegraciones de los mesones 
B. En la Fig. 12.41 mostramos los resultados obtenidos para la parametrizacion del especto leptonico, 
para diferentes combinaciones de experimentos incluidos en el fit, como se describe en detalle en la 
secci6n l^31 

Esta aplicacion demuestra como nuestra tecnica general para parametrizar datos experimentales 
se puede aplicar a la reconstruccion de una funcion si la linica informacion experimental accesible 
precede de integrales truncadas de esta funcion. Esto permite una comparacion mas general de 
las predicciones teoricas con los datos experimentales, sin necesidad de hipotesis adicionales y con 
una estimacion fidedigna de los errores experimentales. Asimismo, el desarroUo de estas tecnicas 
permitira en un futuro aplicarlas a otras situaciones de interes en fisica de mesones B, como por 
ejemplo la parametrizacion de la shape function S{uj), que contiene los efectos no perturbativos 
dominantes en una serie de procesos como las desintagraciones radiativas de los mesones B. 

La aplicacion mas imporante de todas es descrita en la ultima seccion del Capitulo la 
parametrizacion de distribuciones de partones. Primero describimos una nueva tecnica para im- 
plementar la dependencia con la energia de estas distribuciones, y seguidamente describimos 
la construccion de la densidad de probabilidad en el espacio de las distribuciones de partones non- 
singlet, a partir de datos experimentales de funciones de estructura. En la Fig . 12 . 51 podemos observar 
los resultados para la distribucion de partones nonsinglet obtenida en la presente tesis doctoral con 
dos valores diferentes Qq del corte cinematico. Este corte cinematico es debido a que solo los 
datos experimentales de la funcion de estructura F^^{x, Q^) con suficientemente grande pueden 
tratarse mediante la teoria de perturbaciones de la Cromodinamica Cuantica. 

En la Fig. 12.61 tenemos el esquema del proceso utilizado en la parametrizacion de la distribucion 
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Figure 2.4: 

Comparacion de los resultados para la parametrizacion del espectron energetico leptonico con todos los 
experimentos incorporados en el fit, con el case en que solamente los dates del experimento Babar han side 

considerados. 



de partones non-singlet, a partir de datos experimentales de las funciones de estructura en colisiones 
profundamente inelasticas. Recordemos los tres pasos principales de nuestra tecnica descrita ante- 
riormente, como pueden verse en la Fig. 12.61 generacion Monte Carlo de replicas de las medidas 
experimentales, entrenamiento de redes neuronales artificiales que parametrizan la distribucion de 
partones non-singlet y finalmente la validacion estadistica de los resultados. 

Las tecnicas descritas en la Secci6n f6.4.2l proDorcionan la base para la realizacion de una parametrizacion 
de todas la distribuciones de partones, incluyendo el gluon, que son necesarias para aplicaciones 
fenomenologicas generales. En particular, es directo extender el formalismo de evolucion que de- 
scribiremos en el caso de las distribuciones nonsinglet al caso general con combinaciones arbitrarias 
de distribuciones de partones. De la misma manera, los datos experimentales que se usan en este 
caso son los del analisis de la Scccion lS!^ esto es, la parametrizacion de la funcion de estructura del 
proton F2{x,Q^). For lo tanto, con los resultados obtenidos en la presente tesis doctoral se tienen 
todos los ingredientes necesarios para obtener un set de todas las distribuciones de partones con la 
tecnica descrita en el Capitulo[3 

Finalmente, tras las conclusiones de la presente tesis doctoral, dos apendices incluyen material 
de referenda sobre analisis estadistico de los datos experimentales y sobre el estado actual de las 
determinaciones de las distribuciones de partones. En el primer apendice, dedicado al tratamiento 
estadistico de los datos experimentales, describimos con un modelo sencillo que se puede resolver 
exactamente las propiedades de los diferentes estimadores usados para el entrenamiento de las redes 
neuronales, asi como los efectos de un tratamiento incorrecto de los errores de normalizacion. En 
el segundo apendice resumimos el estado actual de las determinaciones globales de distribuciones 
de partones, analizando con detalle las caracteristicas de los analisis de las dos colaboraciones mas 
importantes, CTEQ y MRST, junto con los datos experimentales y las parametrizaciones de las 
distribuciones de partones utilizadas en cada caso. 
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Figure 2.5: 

Comparacion de los resultados para la distribucion de partones xqNs{x,Qo) para dos valores diferentes del 
corte cinematico: el de referenda > 3 GeV^, con uno mas conservative > 9 GeV^. 
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Figure 2.6: 

Estrategia general seguida para la parametrizacion de la distribucion de partones non-singlet qNs{x,Qo) a 
partir de datos experimentales de la funcion de estructura Fi''^ (x, Q^). 



Chapter 3 



Introduction and motivation 



The Large Hadron Collider (LHC) is a proton-proton collider at center of mass energy y/s = 14 TeV 
which is located at the Organisation Europeennc pour la Recherche Nucleaire (CERN), near Geneve 
in Switzerland (see Fig. 13.1(1 . Its construction is almost finished, and it is scheduled to deliver the 
first collisions by the end of 2007. The energies that will be achieved at the LHC will allow us to 
probe the smallest distances ever and will open a new era for particle physics, since it will investigate 
the true nature of electroweak symmetry breaking, and hopefully will deliver invaluable information 
of new physics beyond the Standard Model (SM) of particle interactions. However, possible new 
physics signals will appear together with a far more copious background from Standard Model 
processes, essentially from the interactions of quarks and gluons within the proton as determined by 
Quantum Chromodynamics (QCD), the sector of the SM that describes the strong interaction. 

Therefore, the discovery potential of LHC as well as its ability to also perform precision measure- 
ments of the new physics properties depends crucially on the understanding of the huge Standard 
Model background. Since the LHC is an hadron collider, to be able to perform precision predic- 
tions for different observables one needs first to understand quantitatively the underlying strong 
interaction processes and the associated uncertainties. Among these uncertainties, one of the most 
important ones comes from the parton distribution functions, which measure the momentum distri- 
bution of quarks and gluons inside the protons. Parton distributions cannot be computed from first 
principles, but rather they need to be extracted from experimental data from other hard scattering 
processes, like for example deep inelastic scattering. 

Parton distribution functions qi{x, Q^) can be determined from experimental data by means of 
the QCD factorization theorem, which states that any hard-scattering cross section can be separated 
into a process-dependent coefficient and a set of universal, process-independent parton distribution 
functions. That is, the factorization theorem relates observables like deep-inelastic scattering struc- 
ture functions F{x^ Q^) to a convolution of short-distance coefficient functions, Ci (x, as (O^)) i which 
can be computed in perturbation theory, and parton distribution functions. 



The dependence with the scale of the parton distributions is determined in QCD perturbation 
theory from the so-called parton evolution equations. Note that in general different combinations 
of parton distributions contribute to different observables. The inclusion of a wide variety of hard- 
scattering data is thus crucial to disentangle the various parton distributions. Even if the backbone 
of the determinations of parton distributions is the high precision deep-inelastic scattering data, 
essential experimental input comes from other measurements like jet production, heavy boson pro- 
duction or the Drell-Yan process. In Fig. 13. 21 we show the set of parton distribution functions from 
a recent global QCD analysis. 

The requirements of precision physics at hadron colliders, specially the Large Hadron Collider, 




(3.1) 
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Figure 3.1: 

The location of tlie LHC experiment near Geneve (left) and the one of its detectors, ATLAS (right). 



determine that it is now mandatory to determine accurately not only the parton distributions them- 
selves but also the uncertainties associated to them. Note that a detailed knowledge of parton 
distributions and their associated uncertainties is essential if one wants to perform accurate mea- 
surements, since a typical LHC cross-section a{x,Q^) reads 

cr(x,Q2) = C^J {x,as (Q^)) (g> q^ix,Q^) (g> qj{x,Q^) , (3.2) 

which involves the product of two parton distributions from each of the two partons in the initial 
state of the proton-proton collision. The problem is specially acute since the kinematic region 
covered by the LHC overlaps only partially with the kinematical range of those experiments used 
to determine the parton distributions, like HERA, as can be seen in Fig. 13.31 and therefore one 
has to extrapolate parton distribution into an unknown kinematical region. Also for this reason 
it is essential to determine the uncertainties in parton distributions and propagate them into the 
extrapolation region probed by LHC. 

The main problem to be faced here is that one is trying to determine an uncertainty on a function, 
i.e., a probability measure on a space of functions, and to extract it from a finite set of experimental 
data. Within the framework of the standard approach to determine parton distributions from 
experimental data, several techniques have been constructed to assess these uncertainties. However, 
even if all these techniques have been useful to estimate the size of the uncertainties, they suffer from 
several drawbacks. First of all, since in the standard approach parton distributions are parametrized 
with relatively simple functional forms like 

q^{x, Ql) = A,x^^ (1 - xY^ (1 + d,x + e,x^ + ..) , (3.3) 

where the parameters Ai,bi, . . . are fitted from experimental data, the estimation of the uncertainties 
is restricted to the space parametrized by Eq. 13. 31 and therefore depends heavily on the assumptions 
done for the non-perturbative shapes of the parton distributions. Second, in order to propagate the 
uncertainties in the parton distributions to an arbitrary observable, linearized approximations in 
the error propagation have to be used, whose validity is at best doubtful. Finally, in the presence 
of experimental data from different experiments, it has been argued that incompatibility between 
different experiments force that the tolerance condition in the used to estimate the errors is not 
the textbook value Ax^ = 1 but a much larger value A^^ = 50 or 100. Even if this choice can be 
justified to some extent, in practice parton uncertainties determined with this condition lose their 
statistical meaning, since they depend on the choice of the free parameter Ax^- 
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Figure 3.2: 

Parton distribution functions gi(a;,(3o) determined from experimental data in a recent QCD global 

analysis |12) . 



Therefore, in view of the crucial importance of parton distribution functions for LHC precision 
physics, it is worth investigating alternative approaches for the determination of parton distributions 
and the associated uncertainties that bypass the problems of the standard approach discussed above. 
In Ref. im a novel technique was presented to determine the associated probability measure in the 
space of a function, which was applied to the parametrization of deep inelastic structure functions. 
This technique uses a combination of Monte Carlo samplings of the experimental data together with 
artificial neural networks as universal unbiased interpolants. The use of neural networks instead 
of fixed functional forms as in Eq. avoids the bias introduced by the assumption of a given 
functional form, and the Monte Carlo sampling provides a statistically rigorous estimation of the 
function uncertainties, which allows for a general error propagation to arbitrary observables without 
the need of linearized approximations. 

In this thesis we extend the work of Ref. j^l^ in different ways: first we have applied the general 
strategy to parametrize experimental data to other processes of interest, in particular to the case 
of parton distribution functions. In all these cases the relation between the parametrized quantity 
and the experimental data was completely different, so it has been shown that the approach of Ref. 
|11 | is suited for a variety of situations. Second, we have extended the basic technique with the 
introduction of new minimization algorithms, specially genetic algorithms, as well as new statistical 
estimators to assess the stability of the obtained probability measure in different aspects. Finally, 
several improvements of the neural network training have been introduced to optimize its efhciency. 

The outline of the present thesis is as follows. In Chapter 4 we review the basic elements 
of Quantum Chromodynamics, as well as those high energy processes that will be used later in 
the thesis. We present also a review of the standard approach to parametrize parton distribution 

iSee also Ref. [T^ 
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Figure 3.3: 

The kinematical coverage of the LHC compared to that of the HERA coUider and fixed target 

deep-inelastic scattering experiments. 



functions with their associated uncertainties. In Chapter |31 we introduce the general technique to 
parametrize experimental data in an unbiased way with faithful estimation of the uncertainties, 
which is the main subject of this thesis. Then in Chapter |B1 we present four applications of the 
general strategy, and special attention is paid to the most important one, the parametrization of 
the nonsinglet parton distribution. Two appendices summarize background material on statistical 
analysis of experimental data and on the current status of the standard approach to global fits of 
parton distributions. 



Chapter 4 



Elements of perturbative QCD and 
global parton distributions analysis 

In this first Chapter we review the basic elements of Quantum Chromodynamics (QCD), the gauge 
theory of the strong interactions. After a brief description of the theoretical foundations of QCD, 
we describe in some detail the deep-inelastic scattering process. This process is the most important 
source of information on parton distribution functions, whose parametrization, as we have discussed 
in the Introduction, is the main motivation for the set of techniques developed in the present thesis. 
We will consider also two other high energy processes, since they have provided testing grounds for 
our strategy to parametrize experimental data: the semileptonic decays of the B meson and the 
hadronic tau decays. 

Parton distribution functions, as has been discussed in the Introduction, have to extracted from 
experimental data, by means of a QCD analysis of a variety of hard scattering measurements. In 
the second part of this Chapter we present a summary of the standard approach to global fits of 
parton distributions, and we discuss in some detail the different methods, with their advantages and 
drawbacks, which are commonly used to estimate the associated uncertainties of parton distribution 
functions. 

4.1 Overview of perturbative QCD 
4.1.1 Basics of Quantum Chromodynamics 

Quantum Chromodynamics (QCD) is the gauge theory that describes the strong interaction (see 
for example Refs. |14[ I15j and references therein). Gauge invariance under the group SU(N) and 
renormalizability determine completely the form of the Lagrangian. The QCD Lagrangian describes 
the strong interaction between quarks and gluons, and it reads 

Nf 

-Cqcd = ^ g» i^rD^ - m,) - -F^^F^, . (4.1) 

Let us describe the elements of the above equation. The N f spinor quark fields of different flavor 
are labeled by g^, each with mass m^. The Dirac matrices 7^ appear due to the fermionic nature of 
quarks, and are defined by the anticommutation relation {7^, 7"^} = 25'^". The covariant derivative 
reads in terms of the gluon field 

(^A.)afc-5/^'^afc + i5(*X)a6 ' (4-2) 
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where fi is a Lorentz index, fi — 0,1, 2, 3, and a, b are color indices for the fundamental representation, 
a,b — 1,...,N. The matrices t^, where A is a color index in the adjoint representation, A = 
1, . . . , N'^ + 1, are the SU(N) generating matrices in the fundamental representation. The last term 
in Eq. 14.11 is the field-strength tensor for the gluon field, 

= d,At - d.A^ ~ gf^^'^'A^A'^ , (4.3) 

where f^^'~^ are the structure constants of SU(3), defined by the commutation relation 

^ (4 4) 

The last term in Eq. 14.31 describes the self-interaction of the gluons, a typical feature of non-abelian 
gauge theories like QCD which renders the theory asymptotically free, as discussed below. Finally, 
g is the QCD coupling constant, which is, together with the quark masses, the only free parameter 
of the theory. 

From this Lagrangian, using standard rules of Quantum Field Theory |16| one can compute 
several observables, like cross sections or decay rates, in a perturbative series expansion in powers 
of as = g^/^TT. Radiative quantum corrections induce a dependence of the strong coupling with 
respect to the typical energy of the process E, which at lowest order reads 

as^as{E) ^ ^ , (4.5) 

where /3o is the first coefficient of the QCD [3 function that determines the running of as with the 
energy. The main feature of QCD can be seen from Eq. 14.51 the theory is asymptotically free 
|17[ I18| , which means that the coupling constant vanishes when the typical energies of the process 
become very large with respect to the typical scale of the theory, Aqcd ■ Asymptotic freedom allows 
us to apply QCD to many high energy processes for which perturbation theory is meaningful, since 
in this case E ^ Aqcd and therefore as{E) ^ 1. On the other hand, the same asymptotic freedom 
property implies that the theory becomes strongly coupled at low energies, E < Aqcd, and in this 
non-perturbative regime the standard tools of perturbation theory are useless, and one has to resort 
to other methods, like lattice computations jlS]. In Fig. 14.11 we show a comparison of different 
extractions of the strong coupling as{E) with the theoretical QCD predictions 20 . 

For single scale observables, that is, for processes which depend only on a single hard scale 
(where again a hard scale is a energy scale which satisfies the condition ^ Aqcd), the 
perturbative expansion in powers of the strong coupling reads, for those processes in which we are 
interested in, 

oo 

i?(Q2) = ^a„a,(Q2)" , (4.6) 

71=1 

where without loss of generality we have assumed that the perturbative expansion of the observable 
R(Q^) starts at order as (Q^)- The best example of this first case is the total cross section of e+e~ 
going to hadrons, where in this case the hard scale is identified with the center of mass energy of 
the collision, = s. 

Perturbative expansions become more complicated as the number of hard scales present in the 
process begins to increase. In this case it is not always true that observables can be expanded in a 
simple power series in as {Q^)- Let us consider to be definite the deep-inelastic scattering processes, 
which is the best known example of a two-scale process, and which will be discussed in detail in 
the next Section. In this case the two hard scales will be denoted by and W^. In deep- inelastic 
scattering, as we will see in brief, the nonsinglet structure function at leading order is given by 



as [Q ) 



(4.7) 
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Figure 4.1: 

Comparison of different determinations of the strong coupling (Q) at several energies Q from a variety of 

processes, as summarized in Ref. |2()| . 



where N is the conjugate variable of the ratio x = /{Q^ + W^), and therefore F^^ cannot be 
expanded in integer powers of as (Q^) ■ Another example is the singlet structure function at small 
X 123) which reads 



exp 



(v/ln(l/a;)lnln(a, (g2))) , (4.8) 



which again is not expandable in a simple series in powers of as {Q^)- 

In some cases, however, perturbative computations which depends on two hard scales have a 
perturbative expansion of a similar form of Eq. 14.61 that is, a expansion in integer powers of the 
strong coupling, like the deep-inelastic scattering coefhcient functions, which are of the form 

C{Q^W')=Y.as{Q')'b,(^) . (4.9) 



fc=i 



Note that unlike Eq. I4.t)l the coefhcients of the expansion bk are not pure numbers but functions of the 
ratio of scales Q^/W^. In these coefficients bk one can encounter large logarithmic terms of the form 
(inQ^ /W^Y . These logarithmic terms can become large in some kinematical regions and need to 
be resummed to all orders J22' in order to trust perturbation theory. Note that typical perturbative 
expansions like Eq. 14.61 and 14.91 are at best asymptotic expansions, and often their large order 
behavior introduces divergences, leading to ambiguities in the value of the summed perturbative 
expansion related to nonperturbative corrections |?!T). Note also that since the coupling as (Q^) 
increases as the value decreases, the perturbative expansions Eqs. 14.61 and 14.91 are meaningful 
only for large enough values of ^ Aqcd, that is, for the so-called hard processes [21] . 

The foundation for results like Eq. 14.61 and 14.91 is the Operator Product Expansion. This 
expansion allows to organize any quantity in a scries in inverse powers of Q^, the typical energy scale 
of the process, by means of an expansion of composite operators in terms of simpler operators of the 
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appropiate dimension. The Operator Product Expansion |25j can be used to parametrize higher- 
order nonperturbative effects that are mostly relevant at low in terms of matrix elements of local 
operators, and in this way to extend the validity range of theoretical predictions. For example, for 
a single scale observable, the operator product expansion reads schematically 



where (Ok) are nonperturbative expectation values of operators with the appropriate dimensions, 
and where the leading order result corresponds to the unit operator, Oq = 1, 



It is clear from Eq. 14.101 that at large enough values of only the leading term in the OPE is 
relevant for phenomenological purposes. The nonperturbative matrix elements (Ok) in Eq. I4.1UI 
can be extracted from experimental data in some processes and then be used in other processes to 
increase the accuracy of the theoretical prediction, as in the case of parton distribution functions, 
which will be analyzed in detail in Section l4.1.2l 

Now we review the high energy processes that will be used in applications of the general technique 
to parametrize experimental data which is the main subject of this thesis: deep-inelastic scattering, 
semileptonic B meson decays and hadronic tau decays. 

4.1.2 Deep-inelastic scattering 

For a variety of reasons deep-inelastic scattering (DIS), the high energy scattering of leptons and 
hadrons, is one of the most important processes in Quantum Chromodynamics. The original, and 
still the most powerful, test of perturbative QCD is the breaking of Bjorken scaling ^Hl in DIS, that 
is, the logarithmic dependence of deep-inelastic structure functions with the momentum transfer 
Q"^. Nowadays, deep inelastic structure functions analyses not only provide some of the most precise 
tests of the theory but also determine the momentum distributions of partons in hadrons, which are 
an essential input in predicting cross section in high energy hadron collisions. 

Deep-inelastic scattering is probably the best theoretically understood process in perturbative 
QCD. The full next-to-next-to-leading order computation was recently finished |27l I28j . and also 
threshold resummation at the next-to- next-to-leading logarithmic accuracy can be implemented |29| . 
On the experimental side, deep inelastic scattering is the high energy process involving strongly in- 
teracting particles which has been measured experimentally with the highest accuracy. For example, 
the lepton-proton collider HERA POI has measured deep- inelastic scattering cross-sections with 1% 
accuracy in a wide kinematical range. 

Moreover, as we have mentioned before, deep-inelastic scattering is essential to be able to use 
perturbative QCD in other processes involving hadrons in the initial state, like proton-proton colli- 
sions at the LHC. This is so because deep-inelastic scattering provides the backbone information on 
the parton distribution functions (PDFs) of the nucleon ^T]- As will be discussed in more detail, a 
detailed knowledge of parton distribution functions and its associated uncertainties is an essential 
input for precision LHC phenomenology. 

Now we present the basic formulae that describe deep-inelastic scattering, and that will be useful 
in Chapter El Deep- inelastic scattering is the high-energy collision of a lepton (an electron, a muon 
or a neutrino) against a hadronic target (a proton or a nucleus). The kinematics of this process (see 
Fig. I4.2|l are determined in terms of two variables x and Q^, defined as 




(4.10) 



OO 




(4.11) 



n=l 




(4.12) 



X = 



2p-q' 
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where q is the momentum carried by the virtual gauge boson and p is the momentum carried by the 
incoming proton. Other important variables are the invariant mass of the final hadronic state 
and the inelasticity y, given by 



,2 _ ^2 1-2; _ q-p 



W'^Q'^, y=lr^, (4.13) 

where k is the momentum carried by the initial state lepton. The applicability of perturbation 
theory requires that both scales and are large, because if not either higher twist corrections^ 
are relevant, or the perturbative expansion breaks down due to the presence of large logarithms of 
the form as (Q^)^ln'^(l — x). The kinematical cut in W"^ can be lowered by including the effects 
of threshold resummation 22] ■ Note that even if is large, W"^ can be small provided x is large 
enough. The relation between perturbative threshold resummation and higher twist nonperturbative 
corrections is still an open issue 0]. 




Figure 4.2: 

The deep-inelastic scattering process: the hard scattering of a lepton off a hadron, typically a 

proton. 



The deep-inelastic scattering cross section can be decomposed using kinematics and Lorentz 
invariance in terms of structure functions Fi{x,Q^). These structure functions parametrize the 
structure of the proton as seen by the virtual gauge boson. If the incoming lepton is a charged 
lepton (an electron or a muon) then for <C the double differential deep-inelastic scattering 
cross section reads 



2 r 



1 



(1 + (1 - yf) F^{x, Q2) + ^ ^p^^^^ q2) _ 2^p^(^^^ q2)) 



(4.14) 



where a is the electromagnetic coupling. If the incoming lepton is a neutrino, then the cross section 

^The twist expansion is the operator product expansion applied to the deep-inelastic scattering process. 
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reads 



G%ME 



dxdy 



y 



M 
2E 



xy^ F^{x, Q2) + y^xF'({x, Q^) + J/ (l - |) xF^{x, ( 



(4.15) 



where Gf is the Fermi constant, M the mass of the target hadron and E the neutrino energy 
in the hadron rest frame. Note that Eq. 14.151 holds both for charged current {W^ exchange) 
and neutral current {Z exchange) neutrino scattering, even if the decomposition of the structure 
functions F^{x, Q^) in terms of parton distributions is different in the two cases All the structure 
functions defined above have been measured in several experiments, and the most precisely known 
is the charged lepton scattering neutral current structure function F2{x,Q^), thanks to the high 
precision measurements at HERA and in fixed target experiments. In Fig. 14.31 we show a summary 
of available data on this structure function from different experiments. Note that the effects of 
QCD evolution, that is, the dependence of F2{x,Q-^) with the scale Q^, is clearly observed in the 
experimental data, specially in the small- a; region. 




Figure 4.3: 

The deep-inelastic structure function Fj'(a;,Q^) as measured by several different experiments (HERA, 
BCDMS and NMC). Note the dependence of F^{x, Q^) with Q^, as dictated by perturbative QCD. 



Structure functions Fi{x,Q'^) depend on the momentum distribution of partons (quarks and 
gluons) inside the proton, which are determined by low energy nonperturbative dynamics, and 
therefore cannot be computed in perturbation theory. However, each structure function Fi{x,Q'^), 
using the factorization theorem |34|. can be written as a convolution of hard-scattering coefBcients 
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Cij{x, as{Q'^)), which depend only on the short-distance (perturbative) physics, and parton distri- 
bution functions qj{x,Q^) PH, which parametrize the (non-perturbative) structure of the proton. 
The factorized structure function reads 



J X y \ y 



(4.16) 



The coefficient functions Ci(x, as((3^)) can be computed in perturbation theory as a power series 
expansion in as{Q^). Parton distribution can be formaUy defined 36_ by means of suitable operator 
matrix elements in the proton, 

(x, Q2) ^ J ^|_e— (0, y~) W[y, 0]7+^(0)b)^ , (4.17) 

where W[y] is a Wilson line (a path-ordered exponential of the gluon field) and the parton distribu- 
tions are renormalized at the scale = . Since these parton distributions are non-perturbative, 
they must be determined from experimental data. The techniques used to extract parton distribu- 
tions from hard scattering data together with the associated uncertainties will be discussed in the 
next Section. 

At leading order in the strong coupling as {Q^), the expressions for the different structure func- 
tions in terms of parton distribution functions are relatively simple. For example, for charged lepton 
scattering off a proton target one has 



^ {u + u + c + c) + ^ {d + d + s + s) 



{x,Q^) , (4.18) 



2xFiix,Q^) = F2ix,Q^) , (4.19) 

where the last relation is the Callan-Gross relation |27| . The parity violating structure function xF^ 
in charged lepton scattering can be neglected in the <C Mf limit, since only contains contributions 
from Z boson exchange. For neutrino-proton scattering the appropriate relations are given by 

F^{x,Q^) = 2x[d + s + u + c]{x,Q^) , (4.20) 

xF^ (x, Q^) =2x[d + s-u-c] {x, Q2) . (4.21) 

In the above expressions, u{x,Q^) is the parton distribution function of the u-quark in the proton, 
d{x,Q'^) that of the d-quark, and so on. 

At very high energies, > M|, one has to take into account the contribution of Z boson 
exchange in neutral current charged lepton scattering. The generalization of Eq. 14.141 which incor- 
porates the complete neutral current exchange is given by 



xy'F.^^ix, Q') + (1 + y)Ff ^(x, Q') + y (l - |) F,^^{x, Q^)] , (4.22) 



dxdQ^ xC 

where in terms of parton distribution functions one has 

Nf 



Fi'^ix) = 2a;f^i^^(x) = ^ Mx) + g" (x)] C,iQ^) , (4.23) 



1=1 

Nf 



xF,^^{x) - }_J [q,{x) - Ux)] D,{Q') , (4.24) 
1=1 



C,(g2) = - 2e,V,V,Pz + (K' + Al){V^ + A^)Pl , (4.25) 
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Dg{Q') = -2e'^A,AgPz + W^A^VgAgPl Pz = gj^^ , (4-26) 

where are the electromagnetic charges of the quarks and Vi and Ai are the vector and axial 
couplings of the fermions to the Z boson. Note the appearance of the parity-violating structure 
function F3(x, Q^), which is sensitive to the helicity of the incoming leptons (that is, it is different 
for example for an electron and for a positron). 

Even if parton distribution functions (?i(a;,Qg) are of non-perturbative origin, it can be shown 
that their dependence with the scale is dictated by perturbative QCD, provided the scale is 
large enough. The dependence of the parton distributions with the scale Q^, also known as their 
evolution with Q^, is dictated by the perturbative DGLAP [21112111301 evolution equations. These 
equations can be used to evolve with any combination of parton distributions, however, their 
form is much simpler if suitable combinations are defined. For nonsinglet combinations of parton 
distributions, defined as differences between quark distributions, 

qNS,ij{x, Ql) = {qi - qj) {x, Ql) , (4.27) 

where i, j label either a quark or an antiquark, the DGLAP evolution equation reads 

dqNsix,Q^) asiQ^) dy 



'-PNs{y,as{Q^))qNs(^^,Q^^ , (4.28) 



where PNs{x,as (Q^)) are the non-singlet splitting functions. These splitting functions can be 
computed perturbatively as an expansion in powers of ag (Q^) ■ For instance, the leading order 
expression for the nonsinglet splitting function is given by 



3(0) . s _ / 1 + 3 



P'n^ (x) = Cf [jY^ + 2^(^ - ■ ^^'^^^ 

It is clear from its definition that the gluon is decoupled from the evolution of nonsinglet parton 
distributions. The remaining independent combination of parton distribution is called the singlet 
parton distribution, defined as the sum of all quark and anti-quark flavors, 

E(a;,Q2)^^(^^(3,,Q2)+5^(a;,Q2)) (4 30) 

1=1 

In the singlet sector, the DGLAP equation is a 2-dimensional matrix equation. In this case the singlet 
distribution evolves coupled to the gluon parton distribution using the singlet DGLAP evolution 
equation. 



dlnQ2 ^ g{x,Q^) J 27T y \ Pgg{y) Pggiy) J \ g{x/y,Q^) 



(4.31) 



in terms of the singlet matrix of splitting functions. 

The DGLAP evolution equations, Eqns. 14.281 and 14.311 can be solved using a wide variety of 
techniques. A particularly useful method is the transformation of the evolution equations to Mellin 
space, also known as moment space, using the integral transformation 

q4N,Q^)= f dxx'^-\,{x,Q^) . (4.32) 



In Mellin space, the nonsinglet DGLAP evolution equation, Eq. 14.281 is no longer an integro- 
differential equation but rather a simple differential equation, 

dqNs{N,Q^) as{Q'^) , I at n-i\ ( a 
^Y^Q2 = 2-K )) INS [N,Q ) , (4.33) 
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where the anomalous dimension jns (^N, as{Q'^)) is the Melhn transform of the sphtting function, 



7Ars {N,a,{Q^)) = / dxx 
Jo 



N-l 



Pns {x,as{Q^)) 



(4.34) 



The main advantage of this method is that in MeUin space the DGLAP equations can be solved 
analytically. In the nonsinglet section, for example, Eq. 14.331 has the solution 



QNsiN, Q2) = r (TV, as (Q2) , as (Ql)) qNs{N, Ql) , 



(4.35) 



in terms of an evolution factor F (N) which is constructed in terms of anomalous dimensions, jNsiN). 
Similar results hold for the singlet DGLAP equation Eq. I4.31l in Mellin space. Once the evolution 
equations have been solved in Mellin space, one needs to invert back to x-space, using the inverse 
Mellin transformation, 

q{x,Q^)^^ [ dxx-^q{N,Q^) , (4.36) 
Zm J (J 

where the integration contour C in the complex N plane is parallel to the imaginary axis and to the 
right of all the singularities of the integrand. Except in very special cases, this inverse transformation 
can only be performed by numerical integration. 

To end with this review of deep-inelastic scattering, let us describe sum rules of structure func- 
tions. Sum rules are integrals over certain combinations of structure functions which have a particular 
value in the parton model [411 . Sum rules are extremely useful for many purposes, from precision 
determinations of the strong coupling, tests of perturbative QCD and to gain more insight on non- 
perturbative dynamics. A first example is the Gross-Llewellyn Smith sum rule |42| . which is related 
to the parity- violating structure function F^{x,Q^) in neutrino- nucleus scattering. The GLS sum 
rule is an exact consequence of QCD in the limit of isospin symmetry, and reads 



Igls — 2 



dxiF,_ 



(f;^^{x,q^) + fJp{x,q^)) 



= 3 



fc=i 



kOis 



(4.37) 



were the terms of O (Q^)'^^ are corrections from perturbative QCD. 

The second example of a deep-inelastic scattering sum rule is the Gottfried sum rule 021 for 
charged lepton scattering, which depends on the difference of structure functions in the proton and 
in the neutron, also known as the nonsinglet structure function F^^ {x, Q^), 



Ig= f-{F^^{x,Q^)-Fr{x,Q'))^ f-F^'{x,Q^) = \+ fdx{u~l). (4, 

Jo X Jo X 6 Jq 



38) 



The first term in the right side is the naive result of the parton model. Note that oppositely to 
the case of the GLS sum rule discussed above, the Gottfried sum rule has no justification in full 
QCD, and indeed it is violated, as was observed from experimental measurements of the nonsinglet 
structure function by the NMC collaboration (44j. In this case perturbative corrections turn out to 
be negligible. The measurement of this sum rule showed that isospin symmetry in the sea quarks 
does not hold for the proton, that is, u{x) ^ d{x). 



4.1.3 B meson semileptonic decays 

Hadronic states with quarks with large masses, the so called heavy quarks, are of considerable 
interest for QCD for a variety of reasons. Perturbative QCD is not applicable to strongly interacting 
bound states composed of light quarks only, since all the scales involved in the problem are of the 
same size of the typical hadronic scale, Aqcd- However, soon after the advent of QCD it was realized 
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that the situation was considerably different for hadrons with at least one heavy quark, where heavy 
means the condition that its mass mn satisfies 

mn > Aqcd • (4.39) 

This is so because the heavy quark mass provides a large mass scale, so that perturbative com- 
putations of a variety of processes related to this system, like decay rates or spectroscopy, can be 
computed in perturbation theory as an expansion in as {fn^^ <§; 1. The heavy quark condition Eq. 
14.391 is satisfied by the charm, bottom and top quarks, even if in the latter case it is of no practical 
interest since the top quark does not hadronizc due to its short lifetime. 

For this reason, hadronic states with b quarks have become a theoretical laboratory for pertur- 
bative QCD. It has proved to be an useful environment for the development of several effective field 
theories, like Non Relativistic QCD 05] , Heavy Quark Effective Theories 0B| and more recently 
the Soft CoUincar Effective Theory |47[ US] . All these effective theories make use, in one form or 
another, of the condition Eq. l4.39l to simplify the dynamics of the relevant processes. 

In this thesis we will focus our attention, for reasons to be described in the following, to the 
inclusive semileptonic decays of B mesons into charmed final states. This process is useful to de- 
termine the CKM matrix element, Vet with high accuracy as well as the b quark mass m;,. The 
process, B — XJi', is represented in Fig. 14.41 The most inclusive observable that can be measured 
in this process is the total semileptonic decay rate. This decay rate can be written as perturbative 
series expansion in q;s(toj), where as has been discussed before, the b quark mass mf, plays the 
role of the hard scale of the process, as well as an expansion in inverse powers of the b quark mass 
parametrized by nonperturbative matrix elements of local operators, in the OPE spirit [49(. This 
expansion in powers of l/rri}, is also known as the heavy quark expansion |50j . The inclusion of the 
leading nonperturbative effects through the heavy quark expansion is crucial to analyze this process, 
and in general, heavy meson physics, since the heavy quark masses are not so large compared to 
Aqcd- Therefore, typical nonperturbative corrections of the order O (AqcD/^T-h) have to be taken 
into account for precision theoretical computations. 

With this caveats, we can write for the total B meson semileptonic decay rate the following 
expression 

r (B -> XJl.) - ^l^lKbP (1 + A™) Ape,t(p) • 



2mi J 2ml 



(4.40) 



where the phase space factors are given by 



zo(p) = l-8p-)-8p^-p4_i2p2iogp, p=^, (4.41) 

mi 

g{p) = -9 + 24p - 72p2 + 73p3 _ i5p4 _ 25^2 p ^ (4 42) 

and the leading nonperturbative effects are parametrized by Ai and A2, which are matrix elements 
in the B meson of local operators with the appropriate dimension, stands for the electroweak 
radiative corrections and Apert(p), 

00 

Vrt(p) ^^Ck (p) as {mlf , (4.43) 

fc=0 

includes the QCD perturbative corrections. 
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Figure 4.4: 

The semileptonic decay of a B meson into a ctiarmed liadronic state and a lepton-neutrino pair. 



Much more detailed information on the underlying dynamics of the process can be obtained by 
analyzing less inclusive observables. These observables can be constructed from the triple differential 
decay rate for this process, 

Bip) ^ l{pi) + Dip^) + X,ir) , (4.44) 

which depends on three different kinematical variables q^,r and Ei, where q = pi + Pi, is the total 
four momentum of the leptonic system, r = p — q \s the four-momentum of the charmed hadronic 
final state, with invariant mass = M|-, and Ei is the lepton energy in the rest frame of the 
decaying b quark. This triple differential distribution can be decomposed, using the kinematics of 
the process and the symmetries of the theory, in terms of three structure functions, 



^[q- r E,) = 



dq^drdEi 



q^Wr{e,u) 



2vpi - 2vpivq + ^ ] W2if,u)+f {2vpi-vq)W3{f.u) 



(4.45) 



where 



-TO' V 



■- p/mi, and the quantities with a hat are dimensionless quantities normalized 



to TOfe. All the structure functions Wi{q ,u) have both a perturbative expansion in powers of as, 
and a nonperturbative expansion in powers of I /mi,, which can be computed in the framework of 
the heavy quark expansion. For example, the 0{as) corrections for all the differential distributions 
that can be constructed from Eq. 14.451 have become available recently 51, 52 . 

Typical observables which are accessible in experiments are convolutions of the differential spectra 
with suitable weight functions over a large enough range, with kinematical cuts. A particular case 
of these observables are the moments of differential decay distributions. In this thesis we will study 
the leptonic moments, defined as 



LniEo,p) 



dEi [El - pT / dq^dr {q\r, Ei) 

; J dq^drdEi 



(4.46) 



where E^ is a lower cut on the lepton energy, required experimentally to select this decay mode 
from the background from other B meson decays, and i?max is the maximum energy allowed from 
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the kinematics of the process that the lepton can have, 

9 9 
^ m R — TO n 

i?max = ^ ^ , 4.47 

where tob is the average of the mass of the neutral and charged B mesons, and similarly for mo- 
In particular we are interested in the behavior of the lepton energy distribution, defined as 

fir P f]^r 

which is related to the observable leptonic moments via 

Ln{Eo,fi)= dEi{Ei- fi)'' —{El) . (4.49) 

Jeo "-El 

Available published data for this process consist on moments of the lepton spectrum, Eq. 14.491 In 
Section 16.31 we reconstruct the underlying lepton spectrum, Eq. 14.481 from experimental informa- 
tion on its moments together with additional theoretical constraints, using the general strategy to 
parametrize experimental data described in Chapter |5l 

The lepton energy spectrum Eq. 14.481 in B ^ XJi^ decays can also be expanded in a power 
series both in as and in 1/mfc. The leading order spectrum in both expansions is given by 

^{B^ XM - ro2y2 [(1 _ /)2(i + 2/)(2 - y) + (1 - ff{l - y)] ^?(1 - y - p) , (4.50) 

where 

To = ^ (4-51) 
1927r'' 

is the total semileptonic decay rate in the parton model for massless final state quarks and we have 
defined 

P ml 2Ei 

/ = T^, p^^, y = — ■ 4.52 

I — y mi, 

The leading perturbative 0{as) corrections to this spectrum have been known for some time |54j . 
and there are estimations of the size of higher order terms 55 though the BLM expansion |56j . 
The leading nonperturbative ©(l/m^) corrections to the lepton energy spectrum were computed 
in Refs. |53[ I57j and the 0(1/to^) corrections in Ref. Similar expansions exist also for the 

lepton energy moments (see Ref. |5H and references therein), where nonperturbative corrections 
have been computed up to O (1/to^). Note that nonperturbative effects in the 1/mi, expansion ara 
parametrized by B meson expectation values that need to be extracted from experimental data. 
In Fig. 14.51 we show the lepton energy spectrum as measured from the Babar collaboration |6U| . 
before applying several experimental corrections, and the corresponding leading order theoretical 
prediction, Eq. I4.5UI for different values of the b quark mass. 

In Section ft). 31 we construct a neural network parametrization of the leptonic spectrum, Eq. 14.481 
from all the available experimental information on moments of this spectrum, supplemented by 
kinematical constraints. This application will show that the neural network approach can be used to 
reconstruct in a efficient way a function when the only available information comes from moments 
of this function. As a byproduct of such parametrization, we will provide also a determination of 
the b quark mass in the MS scheme TO6 (to^). 

4.1.4 Hadronic tau decays 

The hadronic decays of the tau lepton j^lH^l are for a variety of reasons a key process to study both 
the perturbative and non perturbative regimes of Quantum Chromodynamics |63II64| . Its importance 
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Figure 4.5: 

The folded lepton energy spectrum as measured by the Babar coUaboration as a function of the lepton 
momentum (left) and the corresponding leading order theoretical prediction (right). 



for precision studies of hadronic physics has been extensively studied [53], specially thanks to the 
high quality data provided by the LEP accelerator at CERN |()6[ I57| . Not only the hadronic tau 
decays provide one of the most precise determinations of the strong coupling as (Mr), but since 
the tau mass is not so large as compared to Aqcd, the non-perturbative effects, parametrized by 
vacuum condensates, can be extracted from experimental data in a clean way. In this Section we 
will briefly review the theoretical foundations of the QCD analysis of hadronic tau decays. 
The hadronic decays of the tau lepton (see Fig. 14.6(1 are of the form 

T XVr , (4.53) 

where X is an hadronic system, composed basically by pions, with vanishing total strangeness. 
The final hadronic state can be separated into scalar, vector and axial vector contributions, since 
parity is maximally violated in r decays. The hadronic invariant mass-squared s distribution can be 
measured for each decay channel. These invariant mass distributions dNy/j^/ds are related to the 
so-called spectral functions pi{s) by 

dNv/A is) 

Pv/a{s) = Kv/a{s)^^^ , (4.54) 

for vector (V) and axial- vector (A) final states, and where Ki(s) is a purely kinematic factor. In 
Fig. 14. 71 we show the contributions from the different decay modes to the vector and the axial vector 
spectral functions [HHl . 

Spectral functions are the observables that give access to the inner structure of hadronic tau 
decays. For reasons to be described in the following, we are in particular interested in the difference 
between the vector and the axial vector spectral functions, 

Pv-a{s) = pv{s) - pa{s) ■ (4.55) 

As spontaneous chiral symmetry breaking is a nonperturbative phenomena, the Pv/Ai^) spectral 
functions are degenerate in perturbative QCD with massless light quarks, so any difference between 
vector and axial-vector spectral functions is necessarily generated by non-perturbative dynamics. 
From perturbative QCD one expects that at some value of s large enough the perturbative result 

lim py-A(.)=pl?!"l^(.s)=0, (4.56) 
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is recovered, but current experimental data do not allow to draw any conclusion of how large s is in 
reality. Therefore, the spectral function py_^(s) is generated entirely from nonperturbative QCD 
dynamics, and provides a laboratory for the study of these non-perturbative contributions, which 
have resulted to be small and therefore difficult to measure in other processes where the pertur- 
bative contribution dominates. As will be discussed in brief, these nonperturbative contributions 
are organized by the Operator product expansion and are parametrized by matrix elements in the 
vacuum of local operators, the so-called QCD vacuum condensates. 

As it is well known |53) . the basis of the comparison of theoretical predictions with experimental 
data in the hadronic decays of the tau lepton is the fact that unitarity and analyticity connect the 
spectral functions of hadronic tau decays to the imaginary part of the hadronic vacuum polarization 
tensor, 

n^';^(g)^ I d^xe^^-^{0\T{Ut,ix)Ul',iOy)\0) , (4.57) 

of vector [/^^ = = qjj'^qi or axial vector [/^^ = A^- = Qjj^j^qi color singlet quark currents 
in corresponding quantum states. After Lorentz decomposition is used to separate the correlation 
function into its J = 1 and J — components, 

K^'M = i-a'-q' + q'q") ^.U?') + '/^'Z^'ng.U?') . (4-58) 
for non-strange quark currents one identifies 

^^^^^Iv/Ai^) = ^Pv/Ai^) ■ (4-59) 

Since as we have shown before the spectral functions Pv/a{s) can be measured experimentally, Eq. 
14.591 provides the basis for the comparison between theoretical predictions in the framework of the 
operator product expansion for the hadronic correlator, Eq. 14.571 and experimental data in terms of 
spectral functions, Eq. 14.541 This relation then allows us to implement all the technology of QCD 
vacuum correlation functions to hadronic tau decays. 
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Figure 4.7: 

The different contributions to the total vector (left) and axial-vector (right) spectral functions from the 
hadronic decays of the t lepton as measured by the Aleph detector. 



As has been mentioned before, the basic tool to study in a systematic way the power corrections 
introduced by nonperturbative dynamics is the operator product expansion. Since the approach of 
Ref. [23], the operator product expansion has been used to perform calculations with QCD on the 
ambivalent energy regions where nonperturbative effects come into play but still perturbative QCD 
is relevant. In general, the OPE of a two point correlation function H^"'' (s) takes the form 

n'-'H^)- E 7Z^ E C^'Hs,f^){Oifi)) , (4.60) 

D=0,2,4,... ^ ^ dimC>=r> 

where the arbitrary parameter /i separates the long distance nonperturbative effects absorbed into 
the vacuum expectation elements (0(/i)), from the short distance effects which are included in the 
Wilson coefficient C^'''(s, fi), and J is the parity of the correlator. The operator of dimension D — 
is the unit operator (the perturbative series) In the case we are interested in, the vector-axial vector 
correlator, the operator product expansion has no perturbative term and reads 

^^^%^A ^ ^ 02^1+4 ^2„+4(g^M') (02„+4(m')) , (4-61) 
n—1 

where we observe that the D = 6 term is the first non-vanishing non-perturbative contribution, 
in the limit of massless light quarks, to the pv-Ais) spectral function and, moreover, it has been 
shown to be the dominant one. Therefore, this spectral function should provide a source for a clean 
extraction of the value of the nonperturbative contributions from experimental data. 

Even if the pv~a{s) spectral function is purely non-perturbative, it has to satisfy several sum 
rules that can be derived rigorously from QCD. Sum rules have always been an important tool for 
studies of non-perturbative aspects of QCD, and have been applied to a wide variety of processes, 
from deep-inelastic scattering, as we have seen in Section 13. 1.21 to heavy quark bound states 
introduced in Section f4.1.3l Now we will review one of the classical examples of low energy QCD sum 
rules, the chiral sum rules. The application of chiral symmetry together with the optical theorem 
leads to low energy sum rules involving the difference of vector and axial vector spectral functions, 
Pv-a{s)- These sum rules are dispersion relations between real and absorptive parts of a two point 
correlation function that transforms symmetrically under SU (2)i(g)5[/(2)fl in the case of non strange 
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currents. Corresponding integrals are the Das-Mathur-Okubo sum rule j7U|. 

-r- / ds-pv-A{s) - ^-^^ - Fa , (4.62) 
47r Jq s 3 

where is the pion decay constant, Ir'^) the average radius squared of the pion and Fa the axial- 
vector pion form factor, as well as the first and second Weinberg sum rules j71| , 

/ dspv-Ais) = fl , (4.63) 



dsspv^A{s) =0 , (4.64) 

where in Eq. 14.631 the right hand side term comes from the integration of the spin zero axial 
contribution, which for massless non-strange quark currents consists exclusively of the pion pole. 
Finally, there is the chiral sum rule associated with the electromagnetic splitting of the pion masses 

. / dss\n-^pv^A{s) = --^{ml^-mlo) , (4.65) 



47r2 

where is an arbitrary scale, a the electromagnetic coupling and ttitt the charged and neutral pion 
masses. 

In Section 16.11 we will use our neural network technique to construct a parametrization of the 
Pv~a{s) spectral function from experimental data, including all the information on error and corre- 
lations, and which incorporates all the constraints from the chiral sum rules and from perturbative 
QCD. Using this parametrization we will provide a determination of the non-perturbative chiral 
vacuum condensates, defined in the operator product expansion of Eq. 14.611 whose extraction from 
experimental data has been the subject of active research in the recent years. 



4.2 Global fits of parton distribution functions 

As has been discussed in Section 14.1.21 parton distribution functions cannot be computed in per- 
turbation theory, but rather they have to be extracted from experimental data like deep-inelastic 
scattering. In this Section we review the standard approach to determine parton distributions from 
experimental data together with their associated uncertainties. In Appendix IbI we present a brief 
overview of the current status of the field of global fits of parton distribution functions. 



4.2.1 The standard approach 

Let us consider a set of parton distribution functions qi{x,Q'^), where i = 1, . . . , 2Nf + 1, since there 
is an independent parton distribution for each quark and antiquark flavor, as well as one for the 
gluon. As we have discussed before, the dependence of the parton distributions is determined 
by the DGLAP 38, 4(J, 39 evolution equations, Eqns. 14.281 and 14.311 so one needs to parametrize 
and determine from data only the x dependence of the parton distributions at an initial evolution 
scale Qq. In the standard approach, parton distributions are parametrized with relatively simple 
functional forms, that in full generality can be taken to be 

q,{x, Ql) = A,x''^ (1 - xy^P, (x, d„ e„ . . .) , t^l,...,2Nf + l . (4.66) 

The rationale for this functional form is that parton distributions parametrized this way follow quark 
counting rules [7S| at large x and Regge behavior [71] at small x, and Pi{x) is a smooth polynomial 
in X that interpolates between the small-x and the large-x regions. Note that neither of the two 
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limiting behavior (large-x and small-x) of the parton distributions can be derived in a rigorous way 
from Quantum Chromodynamics, so they are more phenomenological expectations rather that firm 
theoretical predictions. 

In principle one should parametrize and extract from experimental data the 2Nf + 1 independent 
parton distributions. In practice, however, one has to take some assumptions since available data 
cannot constrain all of them. For example, since there is scarce experimental information of the 
valence strange distribution s — s, it is typically set to be zero, s — s. Another example of this kind 
of simplifications was the assumption that the sea u and d distributions were the same. As more 
and better data becomes available, some of this assumptions are shown not to be true, and one has 
to allow more freedom in the parametrizations of the parton distributions. For example, the NMC 
measurements of the Gottfried sum rule (see |431 and references therein) showed that for the proton 
u{x, Q^) ^ d{x, Q^). Since that measurement, the assumption u = d was not used anymore in global 
fits of parton distributions. 

To be definite, we show now the explicit parametrizations from a recent global analysis of par- 
ton distributions 75^ from the MRST collaboration. The up and down valence distributions are 
parametrized as 

xuv{x,QI) = X (u — u) {x,QI) = Aux''''{l — xY" [l + duVx + Cux) , (4.67) 

xdv{x, Ql) = x{d~d) {x, Ql) = Adx^" (1 - xf" (l + ddV^ + edx) , (4.68) 
then the sea combination of parton distributions is given by 

xS{x,Ql)=AsX^^{l-xY^ {l + d,^/^ + esx) , (4.69) 

where the sea parton distribution is defined as 

xS{x,Ql)=2x{u + d + s){x,Ql), (4.70) 

and the gluon parton distribution is given by 

xg{x, Ql) = Agx^o (1 - x)"' (1 + dg^ + egx) - FgX^" (1 - xfo . (4.71) 

Finally, the structure of the light quark sea is taken to be 

2u, 2d, 2s = 0.45 + A, 0.45- A, 0.25, (4.72) 

xA(x, Ql)=x(d-u) {x, Ql) = Aas;^^ (1 - xf^ (l + c^x + d^x^) . (4.73) 

Note that the assumption of a vanishing strange valence distribution s = s is also used. Not all 
the parameters in the above equations are left free, in particular some of them are fixed by quark 
number conservation sum rules, 

1 .1 
uv{x,Ql) = 2, / dv{x,Ql) = \. (4.74) 

The total number of fitted parameters in this case is around 20. With the above relatively simple 
parametrization one can describe a wealth of hard-scattering processes, thus showing that QCD fac- 
torization 1341 holds in the majority of high energy processes involving strongly interacting particles. 

Note that Ref. [7S| allows for the gluon parton distribution to become negative, as can be seen 
from Eq. 14.711 This would appear to be in conflict with the interpretation of parton distributions 
as the probability distributions of the momentum that quarks and gluon carry inside the proton. 
However, it has been emphasized |76) that parton distributions are not physical quantities, in partic- 
ular beyond the leading order approximation they depend on the renormalization scheme. Physical 
quantities, on the other hand, like cross sections and structure functions, satisfy positivity bounds. 
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that is, even if parton distributions are allowed to become negative, structure functions are not since 
they are observable quantities and therefore are positive definite. 

The next step of the global fitting approach is to evolve the set of parton distributions that 
have been parametrized, Eq. 14.561 at the initial evolution scale Qq, to the scale where there 
is experimental data using the solution of the DGLAP evolution equations, Eqns. 14.281 and 14.311 
Then for each specific process one adds the contribution from the perturbative coefhcient functions 
to construct the corresponding observable, for example deep-inelastic structure functions, as in Eq. 
14.161 The number of hard-scattering processes that are nowadays used to constrain the shapes of 
the different parton distribution functions is rather large. These include 

• Deep-inelastic scattering structure functions, F2{x,Q'^), F^{x,Q'^) and Fi^{x,Q'^), both in 
charged lepton and in neutrino DIS, 

• The Drell-Yan process, pp Xll in hadronic collisions, 

• Jet production, both in e^p collisions and in pp collisions, 

• Gauge boson production pp W{Z)X, 

and other processes like prompt photon production and heavy quark production. In any case, it 
has to be emphasized that deep-inelastic scattering is and will be in the following years the most 
important source of information on parton distribution functions, specially the precision structure 
function measurements of the HERA collider (SU]. The LHC will not only use extensively parton 
distributions, but it will also be useful to constrain its shape, since it proves a kinematic region not 
accessible at available colliders, for example with processes like the differential rapidity distribution 
of gauge boson production pTT] . 

The final step of a global fit is to minimize a suitable statistical estimator, for example the 
diagonal statistical error 



N, 



X' ({«.}) = E -2 ■ (4-75) 

2—1 i,stat 

where pf'' is the experimental measurement and pj^'^'^^'^ {{o-i}) the theoretical prediction as a 
function of the parameters {ai} that describe the set of parton distributions. These parameters 
are determined by the condition that the statistical estimator is minimized, that is, one wants to 
determine the set of parameters {a'"'} that satisfy the condition 



(a|°))^min{,,}[x^({a,})] . (4.76) 



Until some years ago, in global fits of parton distribution functions the error function to be min- 
imized was the statistical error function, Eq. 14.751 where Ci^stat is the total uncorrelated statistical 
uncertainty. However the precision of modern experimental data made compulsory to consider the 
effects of the correlated systematic uncertainties. This was so both because the statistical accuracy 
of the data became higher and the experimental groups became to provide the contributions from 
the different sources of systematic errors with the experimental data. The inclusion of correlated 
systematics can be done with two equivalent definitions of the error function • The first one uses 
the explicit form of the experimental covariance matrix, 



where the covariance matrix of the experimental data is defined as 

(4.78) 
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where pij is the correlation matrix of experimental data and crtot j — y fftat j + ''^sys j total 
experimental uncertainty. Since the number of data points is typically large, the inversion of the 
covariance matrix might lead to numerical instabilities. For this reason, a equivalent form of Eq. 
14.771 was proposed that does not involve explicitely the covariance matrix. This second equivalent 
definition is given by 

x'-jhT.^i Pt'' - - E +T.^l, (4.79) 

JVdat .^^ O-.t^t,* V k=l ) fc=l 

where is the contribution from the K sources of correlated systematic uncertainties. In this 
approach one has to minimize this ^ both with respect to the parameters rfc, which determine the 
effect of correlated systematics, as well as with respect to the parameters {ai} defining the QCD 
model f'^'^'^^\ The problem with this definition is that the number of correlated systematics can 
be very large, which leads to a formidable minimization task. A way to overcome this difficulty is to 
perform the minimization with respect to the parameters analytically. In this case one obtains a 
simplified expression, 

X\{ai}) = E — — -Y.^M-'U'Bu, , (4.80) 



where we have defined 



K 

^fe({«i}) = E(^"')fefc'^'=' ' (4.81) 

k' = \ 

i3,({a,}) 2 ^ ^fc.' = 4..' + E ■ (4-82) 

^stat,2 i—\ ^stat,z 

Assuming that the measurements Fj^'^^^\ CTgtat.i and (3ik are in accord with normal statistics, the 
function defined in Eq. 14.801 should have the standard probabilistic interpretation. One can check 
explicitely that the two definitions are completely equivalent. The advantages and drawbacks of 
the two approaches in fits with covariance matrix error have been discussed in other contexts, like 
global fits of neutrino oscillations '77' . The main advantage of Eq. 14.791 is that it does not require 
the inversion of a covariance matrix. 

In Fig. 14.81 we show the result of a recent benchmark |3J QCD analysis of parton distribution 
functions. These results show the characteristic features of the different parton distributions: while 
the valence distributions have to vanish at small-x, due to quark number conservation, the gluon 
distribution grows very fast at small-x. The size of this growth depends on whether or not the 
parametrization of the gluon parton distribution at the initial evolution scale Qq is singular at 
X = 0. The error bands for the parton distributions are computed using some of the techniques 
discussed in the next Section. 

During some time it was though that the determination of the best-fit parton distributions were 
enough for practical phenomenological purposes. However, as better quality data became available, 
together with the progress in higher-order computations, it was realized that it was necessary to 
estimate in addition the uncertainties associated to the parton distributions. These uncertainties are 
of two classes: experimental uncertainties (since parton distributions are extracted from experimental 
data with errors) and theoretical uncertainties (for example the model dependence in the assumed 
shapes of the parton distributions, or the possible effects of non-standard evolution at low x |78|'). 
In the next Section we will review some of the standard methods used to estimate the uncertainties 
of the parton distributions coming from experimental data errors |79| . Theoretical uncertainties (see 
for example |80p are rather more complicated to estimate, and we will not review their treatment 
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Figure 4.8: 

The benchmark parton distribution functions with associated uncertainties as described in Ref. the 
valence down distribution (left) and the gluon distribution (right). 



here. This is so because theoretical errors are not gaussian, and the best one can do to estimate 
their effects is to provide some suitable prescriptions, but no rigorous statistical analysis is available 
for these type of uncertainties. 

4.2.2 Uncertainties in parton distributions 

The first method to estimate the effects of the experimental uncertainties in the parton distributions 
is called the Offset method [HI], which has mostly been used by the QCD analysis of the ZEUS and 
HI collaborations. In this method the systematic uncertainty parameters in Eq. 14.791 are fixed 
to zero in the central fit so that the fitted parameters are as close as possible to the central values 
of published data. Then they are taken into account for the error analysis. This is done in the 
following way: in addition to the usual Hessian matrix, 

„ d^x"^ ({Qm},{rn}) 

-Hy = — — , (4.8d) 

oaiOa 



defined with respect to the set of parameters {a;} defining the QCD model, a second Hessian matrix 
is defined as 

oajOTk 

which contains information on the variations of the error function with respect to the systematic 
uncertainties parametrized by the parameters. Then the covariance matrix for the systematic 
uncertainties is given by 

(jsys ^ H-^VV^H-^ , (4.85) 

and the total covariance matrix is constructed as the sum from the two contributions, the statistical 
and the systematic, 

C'°* = C"*''* + C"y" , (4.86) 
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where C*^* = H ^ is the standard statistical covariance matrix. In this method the uncertainty in 
any quantity that depends on the parton distribution functions, J- [{qt}] is computed from 

TV 

by substituting C by the appropriate covariance matrices, C'stat^j^sys ^^^^ ^^tot obtain the total 
statistical, correlated systematic and total experimental error band respectively. A^par is the total 
number of parameters used in the parametrization of the set of parton distributions at the starting 
evolution scale, Eq. 14.661 This method is not statistically rigorous, but it has the virtue that it does 
not assume gaussianly distributed systematic uncertainties. It gives a conservative error estimate 
as compared with other methods, like for example the Hessian method with Ax^ = 1 rule, to be 
discussed in brief. This method suffers two serious drawbacks: first of all it assumes that linear 
error propagation gives a decent estimate of the total error propagation, an assumption that has 
been shown to be not correct in many cases. Second, as can be seen from the above formulae, the 
estimation of the uncertainties depends heavily on the functional form model than one has assumed 
for the set of parton distributions. 

Another popular method is the so-called Hessian method. In the Hessian method ^21 one assumes 
that the deviation in for the global fit from the minimum value is quadratic in the deviation of 
the parameters specifying the input parton distributions {fli} from their values at the minimum 
{a°}. First one determines the best fit set of parton distributions from the minimization of Eq. 
14.791 that is, including the contribution from correlated systematic uncertainties, as opposed to the 
Offset method. Then to estimate the associated uncertainty one can write 

n n 

Ax'^x'-xl = Y.J2 - «") - «") ' (4-88) 



where H is the Hessian matrix, defined in Eq. 14.831 Standard linear propagation implies that the 
error on an observable T [{ft}] is given by 

N 

(A^)^ = Ax^E^^r(W})^^ (4.89) 



where the covariance matrix of the parameters C*'*** is again the inverse of the Hessian matrix, 
Eq. 14.831 and Ax^ is the allowed variation in x^- Textbook statistics imply that one should have 
Ax^ = 1, however it has been argued that a higher value, Ax^ = 100 is required in order to estimate 
the uncertainties in a faithful way, due to the fact that data from different experiments is sometimes 
incompatible {53 . 

For practical purposes, it is more numerically stable to diagonalize the covariance matrix and 
work in the basis of eigenvectors, defined by 

Hij ({a;}) Vjk = AfcWife i, /c = 1, . . . , TVpar • (4.90) 

One has to take into account that since variations in some directions in the parameter space lead 
to deterioration of the quality of the fit far more quickly than others, the eigenvalues span 
several orders of magnitude. In the Hessian method, one ends up with a set of 2A'par sets of parton 
distributions . One can therefore propagate the uncertainty associated to the parton distributions 
to any given observable T \{<ii\\ with the following master formula, equivalent to Eq. 



{ATf = J E {^^{St})-T{{S-})\^ . (4.91) 
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The drawbacks of this method are similar to those of the Offset method: first of all one assumes 
that the linearized approximation in error propagation is valid, and second errors estimated with 
this method depend heavily on the functional form choosen for the parametrization of parton dis- 
tributions. Finally the introduction of non-standard tolerance criteria Ax^ > 1 does not allow to 
give a statistically rigorous meaning to the resulting uncertainties. 




Figure 4.9: 

Relative errors of parton distributions in the recent CTEQ6 analysis j84|; up quark parton distribution 

(left) and gluon parton distribution (right). 

The final technique that will be discussed is the Lagrange multiplier method which overcomes 
some of the drawbacks of the above methods, specially the linearized approximations. To investigate 
the allowed variation of a specific physical observable is more rigorous using this method that the 
previously discussed ones. In the Lagrange multiplier approach, one performs a global fit while 
constraining the values of a physical quantity [{qi}] in the neighborhood of their values ^^"^ 
obtained in the unconstrained global fit. The starting point is the best- fit set of parton distributions 
S'o characterized by the parameters {0^°''}. The uncertainty associated to the observable T is 
estimated in two steps. First we use the Lagrange multiplier method to determine how the minimum 
of iWi}) increases, as deviates from the best estimate and then one determines the 

appropriate tolerance of {{o-i})^ ^X^- The first step is then minimizing the constrained error 
function 

vf (A, {a,}) = i{a^}) + XT ({a,}) , (4.92) 
for various values of A, following the chain 

^ min [* (Ac., {aj}] {a, (A„)} ^ xl- (4.93) 

This procedure generates a parametric relationship between x^ and the observable, in terms of 
the parameter A, so that given an allowed value of A^^, it is straightforward to derive an allowed 
range for the observable AJ^, without any linearized approximation. In practice this procedure 
generates a sample of sets of parton distributions {Sn\ equal to the size of the parton distribution 
set parameter space A'par, since the minimization is performed in each direction of the parameter 
space, which are then used to assess the range of variation of T allowed by the data, using Eq. 14.891 
It is clear that the Lagrange multiplier method provides a sample of parton distribution sets 
tailored to assess the uncertainty associated to the physical problem at hand. This procedure does 
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Figure 4.10: 

Set of parton distributions with uncertainties computed with different methods from a recent QCD 

analysis of the ZEUS collaboration |86| . 



not involve some of the approximations involved in the Hessian and Offset methods, but it still has 
the problem of the introduction of non-standard tolerance criteria. However, this method suffers 
from a large practical disadvantage, which is that the global fits of parton distributions must be 
repeated every time one needs to determine the uncertainties of a different observable coming from 
parton distributions. 

In Figs. 14.91 and 14. 101 we show the errors associated to different parton distributions from two 
different QCD global analysis. Note that the gluon parton distribution in particular has a rather large 
uncertainty, specially at large and small x, where there are no direct constrains from experimental 
data on its shape. The valence parton distributions, on the other hand, are known with higher 
accuracy since they are directly constrained from the precise deep-inelastic scattering data. 

It must be emphasized that whatever method is used, nowadays there is a common format 
to represent the uncertainties in parton distribution. This is accomplished by providing, on top 
of the best- fit parton distribution set, additional sets of error parton distributions, that span the 
parameter space of the parton distributions allowed by the corresponding experimental uncertainties, 
estimated with some of the methods described above. With the sample of parton distribution sets, 
one computes the uncertainty in any physical quantity [qi] by means of Eq. 14.911 All the modern 
sets from different global QCD analysis of parton distribution function, including the error sets, are 
available through the LHAPDF library fgTj . 

The standard approach introduced in this section for the determination of unpolarized parton 
distributions and the associated uncertainties is also used for similar global QCD analysis which 
involve different types of parton distributions, like polarized parton distributions |88l 1891 1^ . which 
measure the fraction of the total spin of the proton carried by the different partons, or nuclear 
parton distributions 91. i)2 , 93 which measure how parton distributions are modified in nucleus 
within heavy nuclei with respect to those of the free nucleon. Also for parametrizations of the 
photon 94 and pion J95! parton distributions from experimental data the standard approach is 
used. However, in all the above cases experimental data is more scarce and with larger uncertainties 
than in the case of unpolarized structure functions. 
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In Section lS.ll we will introduce an alternative approach to estimate faithfully the uncertainties 
associated to a function parametrized from experimental data. This approach (the Monte Carlo 
approach) can be seen to be equivalent to the more common technique to determine confidence 
levels based on the Ax^ = 1 condition, assuming that in the latter case linear error propagation is a 
good enough approximation. In Appendix I A . 21 we show explicitely this equivalence within a simple 
model. 
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Chapter 5 



The neural network approach: 
General strategy 



This Chapter constitutes the core of the present thesis: we describe the strategy that will be used 
to parametrize experimental data in an unbiased way with faithful estimation of the associated 
uncertainties, using a combination of Monte Carlo techniques and neural networks as unbiased 
interpolants. This strategy has three main parts. In the first part, one constructs a Monte Carlo 
sampling of the experimental data for a given observable, which determines the probability measure 
of this observable over a finite set of data points. Then we use neural networks as basic interpolating 
tools, to construct a continuous probability density for the observable under consideration. This 
parametrization strategy has the advantage that it does not require any assumption, rather than 
continuity, on the functional form of the underlying law fulfilled by the given observable. It also 
provides a faithful estimation of the uncertainties, which can be then propagated to other quantities 
without the need of any linearized approximation. Finally the third step consists on the statistical 
validation of the constructed probability measure in the space of the parametrized function by means 
of suitable statistical estimators. In summary, in this Chapter we will describe how given a measured 
observable F, the associated probability measure in the space of functions, V [F], can be constructed, 
so that expectation values of arbitrary functionals of F, J-[F] can be computed as with standard 
probability distributions, that is 



In Fig. 15. II we show a summary of our parametrization strategy for the particular case of the proton 
structure function. 

5.1 Monte Carlo sampling of experimental data 

In this first Section we describe how we construct a Monte Carlo sample of replicas of the experi- 
mental data. We discuss also the statistical estimators which are used to validate the results of the 
artificial data generation. It has to be emphasized that the Monte Carlo sampling of the data allows 
to estimate in a faithful way the errors coming from experimental uncertainties associated to the 
function to be parametrized. Note that this technique to assess the uncertainties in a function is 
completely independent of the method used to parametrize this function, and in particular it is inde- 
pendent of the fact that we use neural networks. This means that for example this technique could 
be used in standard fits with polynomial functional forms to determine the associated uncertainties. 
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Figure 5.1: 

Summary of the strategy presented in this thesis to parametrize experimental data in an unbiased way 
with faithful estimation of the uncertainties for the particular case of the proton structure function 
i^2(a;,Q'). As can be seen, the three main steps of our approach are the Monte Carlo generation of replicas 
of experimental data, the neural network training and finally the statistical validation of the results. 



5.1.1 Artificial data generation 

Let us consider a given observable F, which in full generality can be assumed to depend on several 
kinematical variables {xi,X2,X3, . . .). If A^dat is the number of experimental measurements of the 
observable F, we define the i-th data point as 

^(cxp) _ 



Fr 



F{xii,X2i, . . .) , 



» = 1, 



dat 



(5.2) 



The first step in our approach is to construct the probability measure of the experimental data for 
these A^dat data points. This is obtained by means of a Monte Carlo sampling of the experimental 
data in terms of replicas of the original measurements. 

There exist two equivalent techniques to generate the replica sample, depending on how the 
information on the experimental uncertainties is available. In the first situation, information on 
all the different independent sources of uncertainty (statistical errors, correlated and uncorrelated 
systematics, and normalization errors) is available. This is the case for experiments like deep- 
inelastic scattering for which the detectors and the different sources of correlated uncertainties are 
well understood. In this case the k-th replica of the experimental data F^'^'^*)'^'^^ is generated as 



I^(art)(fc) /, I (fe) 



^(cxp) 



(fc) 



dat ; 



A: = 1, 



rcp 



(5.3) 

Let us describe the different elements of the above equation. A^rcp is the number of Monte Carlo 
replicas that needs to be generated, while pj^^^^'' are the A^dat experimental values of the observable 
considered. The total uncorrelated uncertainty is defined as 



3 = 1 



(5.4) 
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as the quadratic sum of the statistical uncertainty (Tstat.i and the Nu uncorrelated systematic uncer- 
tainties CTsysji- The contribution to the i-th data point from the j-th source of correlated systematic 
uncertainty is denoted by dsysji, and is the total normalization uncertainty. Finally r^j!^\r[''^ 

(k) 

and r^' , are zero mean univariate eaussian random numbers with the same correlation as the cor- 

responding uncertainties. For example, within a given experiment, for all the data points of the 

(k) 

k-th replica we use the same random number for normalization uncertainties, r)^ , since this uncer- 
tainty is correlated among all these measurements. The condition that these random numbers are 
gaussianly distributed implies that 

hm =o(^) , hm (r^) = I + O ( ^) . (5.5) 

W.cp-oo\* /rep V^rcp/ iV„p-*oo \ * /rep V ^rep / 

The covariance matrix is constructed from this experimental information as 

E '^.ys,Ha.y,,, -t- 1 + S,,al , (5.6) 

and the correlation matrix is then given by 

Pro = > (5-7) 

CTtot,i<7tot j" 



where the total error dtot.i for the i-th point is given by 



t,i ' sys,i 

and finally the total correlated uncertainty (Jsys,i is the sum of all correlated systematics. 

The presence of correlated systematic uncertainties which are not symmetric for some experi- 
ments deserves further comment. As it is well known [961 1971 1^ I99| . asymmetric errors cannot be 
combined in a simple multi-gaussian framework, and in particular they cannot be added to gaussian 
errors in quadrature. For the treatment of asymmetric uncertainties, we will follow the approach of 
Refs. PHI EH], which, on top of several theoretical advantages, is closest to the ZEUS error analysis 
and thus adequate for a faithful reproduction of the ZEUS data on deep-inelastic structure functions, 
that is were these asymmetric uncertainties appear. In this approach, a data point with central value 
xq and left and right asymmetric uncertainties uji and (not necessarily positive) is described by 
a symmetric gaussian distribution, centered at 

(^> = ^0 + '^^^ , (5.10) 



and with width 



al^A^^ ) . (5.11) 



The ensuing distribution can then be treated in the standard gaussian way. 

Once the sampling of the experimental data in terms of a set of replicas of artificial data has been 
generated, it defines the probability measure of the observable F in those data points where there 
exist experimental measurements. From this probability density one can compute any estimator as 
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with standard probability distributions. That is, if A [F] is any functional of the observable F (the 
simplest example is the observable itself for the i-th data point, A [F] — Fi ) then its average as 
computed from the probability measure is given by 

1 

/.CP iVdat ^ ' 

Experimental data on the observable F might be available without information on the separation 
of the different sources of systematic uncertainties, which are then in general collected together in 
the correlation matrix p^*^^^"^ of the experimental data. In this second case, the equivalent of Eq. 15.31 
is given by 

^(art)(fe) ^^(cxp)^^(fc)^^^^^^ , (5.13) 

where {r\'''^} are gaussianly distributed random numbers with the same correlation as the experi- 
mental data points, that is, they verify 

^ / ^cp \' /rcp\^ /.CP +oM ^^(-P), (5.14) 



/ rep \ / rep ]/ \ ■' I rep \ / rep 

where averages over replicas have been defined in Eq. 15.121 In this situation the covariance matrix 
can be computed using 

covjj = Pyf7tot,iO-totj • (5.15) 

The number of replicas of the experimental data A^dat that needs to be generated with any 
of the two equivalent approaches described above is determined by the condition that the Monte 
Carlo sample reproduces the central values, errors and correlations of the experimental data. The 
comparison between experimental data and the Monte Carlo sample can be made quantitative by 
the use of statistical estimators that will be described in the following Section. 

5.1.2 Statistical estimators: generated replicas vs. experimental data 

In this Section we describe the statistical estimators that are used to validate the results of the 
replica generation. Let us recall that the probability measure constructed in the previous section 
aims to reproduce not only central values but also errors and correlations. Therefore we must also 
validate how well errors and correlations are reproduce by the replica sample. These estimators are 
the following: 

• Average for each experimental point 



,(art) ^ ^ 

rep TVrep — 

Associated variance 



1 f 

F^""'-' ) — ^(art)(fe) j-j. ^g-j 



^(art)^ ///^(art)^n _/^(art)\2 ^ (^^^^^ 



Associated covariance and correlation 



■ ^(art) ^(art) \ _ / ^(art) \ / ^(art) ^ 
(art) _ \ ' ^ I rep \ ' / rep \ / rep /_ , 
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(art) (art) fart) fart) ^ r~i\ 

COV^j = Pij CTi cr] • (5-19) 

The three above quantities provide the estimators of the experimental central values, errors 
and correlations which one extracts from the sample of experimental data. 

Mean variance and percentage error on central values over the A^dat data points. 



V 



PE 



p(art) 



^(art) 



dat 



, A^dat 



(art) 



F, 



(oxp) 



rep 



dat 



Nd^t 



E 



F, 



(art) 



F, 



(oxp) 



(exp) 



(5.20) 



(5.21) 



We define analogously (v (cr*'''''^),„p )^ , (v (/5'^'*')j.^p )^ , (v (covt^^*')^.^^ , 

and (^PE (cov('^''*))^^p ^ 



and (^PE 


"/^(art)\ 


>d ' (^^ 


"/p(art)\ 




\ / rep 


/ dat \ 


/ rep 


relations and covariances respectively. 



dat 

for errors, cor- 



dat 



dat 



These estimators indicate how close the averages over generated data are to the experimental 
values. Note that in averages over correlations and covariances one has to use the fact that 
correlation and covariances matrices are symmetric matrices, and thus one has to be careful 
to avoid double counting. For example, the percentage error on the correlation will be defined 
as 



PE 



Jait) 



rep 



dat 



iVdat (A^dat + 1) ^ ^ 



(art) 



(oxp) 



„(oxp) 



(5.22) 



and similarly for averages over elements of the covariance matrix. 
Scatter correlation: 



^(art) 



^(oxp) /^(art)\ \ _/^(exp)\ //p(art)\ \ 

^ ^'"P/dat ^ /dat \\ /'■op/dat 



(exp) (art) 



where the scatter variances are defined as 

r(oxp) _ 



<7l 



(^(exp))^^^^^„((^(exp))^J^ 



(art) 



^(((^<-*Orep)')^^^-(((^^-*^)rep> 



dat 



(5.23) 

(5.24) 
(5.25) 



We define analogously r [o-^*"^*)] , r [p'^''*'] and r [cov'^''''*^] . Note that the scatter correlation 
and scatter variance are not related to the variance and correlation Eqns. 15. 1715.151 The 
scatter correlation indicates the size of the spread of data around a straight line. Specifically 

r [a^'^''*)] = 1 implies that (cr^^^^^) is proportional to a^^^^\ 



Average variance: 



(art) \ _ 



7 



dat N, 



dat T 



(art) 



(5.26) 
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We define analogously (p'-'''^''') ^^t (*^'^^*''''^*'')dat' ^^^^ corresponding experimental 

quantities. This quantities are interesting because even if the scatter correlations r are close 
to 1 there could still be a systematic bias in the estimators Eqns. 15. 1715.151 This is so since 
even if all scatter correlations are very close to 1, it could be that some of the Eqns. 15. 1715.151 
where sizably smaller than its experimental counterparts, even if being proportional to them. 

The typical scaling of the various quantities with the number of generated replicas A^rcp follows 
the standard behavior of gaussian Monte Carlo samples |1U(J| . For instance, variances on central 
values scale as l/A^icp, while variances on errors scale as 1/ y^N^ep- Also, because 

as can be checked using Eq. IA.6I in Appendix lA.ll the estimated correlation fluctuates more for 
small values of p^°'^^\ and thus the average correlation tends to be larger than the corresponding 
experimental value. 



5.2 Neural networks 

5.2.1 Neural networks as unbiased interpolants 

Artificial neural networks |1(J11 11021 ll(J3| provide unbiased robust universal approximants to incom- 
plete or noisy data. An artificial neural network consists of a set of interconnected units [neurons). 
The activation state of a neuron is determined as a function of the activation states of the 
neurons connected to it. Each pair of neurons is connected by a synapsis, characterized by a 
weight Uij. In this thesis we will consider only multilayer feed- forward Perceptrons. These neural 
networks are organized in ordered layers whose neurons only receive input from a previous layer. In 
this case the activation state of a neuron in the (l+l)-th layer is given by 

^t'^ ^ 9 {ht'^) , t = l,...,ni+,, l = l,...,L-l, (5.28) 

n{l) 

^^'^-E-?^'+^^^ (5-29) 

where O^''^ is the activation threshold of the given neuron, n; is the number of neurons in the 1-th 
layer, L is the number of layers that the neural network has, and g{x) is the activation function of 
the neuron, which we will take to be a sigmoid, 

1 + e ^ 

except in the last layer, where we use a linear activation function g{x). This enhances the sensitivity 
of the neural network, avoiding the saturation of the neurons in the last layer. The fact that the 
activation function g(x) is non-linear allows the neural network to reproduce nontrivial functions. 
An schematic diagram of a feed-forward artificial neural network can be seen in Fig. 15.21 

Therefore multilayer feed-forward neural networks can be viewed as functions F : T?."^ TiP"^ 
parametrized by weights, thresholds and activation functions. 



, J = l...,ni . (5.31) 



It can be proven that any continuous function, no matter how complex, can be represented by a 
multilayer feed-forward neural network. In particular, it can be shown |ll)ll I1U3| that two hidden 
layers suffice for representing an arbitrary complicated function |1U4 | . 
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Figure 5.2: 

Schematic diagram of a multi-layer feed forward neural network. 



Artificial neural networks are a common tool in experimental particle physics |1(J2[ IIUSL ll(J6| in 
applications like pattern recognition, where in this context a pattern typically means a experimen- 
tal measurement as a function of a set of several variables which characterize this event. We are 
interested in artificial neural networks in order to parametrize physical quantities in an unbiased 
way, that is, without the need of any assumptions on its functional form behavior. These physical 
quantities to be parametrized can be either direct experimental measurements (like in the case of 
deep inelastic structure functions), or they can be related to experimental data through a functional 
dependence (like in the case of parton distribution functions). Neural networks not only provide 
universal unbiased interpolants for given physical quantity in the region were there is experimental 
information, they also have the property that their behavior in extrapolation regions is not deter- 
mined by their behavior in the data region, as happens in fits with simple functional forms, so 
they are also useful to assess the uncertainty in extrapolation regions (that is, in regions without 
information from experimental measurements). 

Therefore, the performance of neural networks is worse for extrapolation than for interpolation. 
That is, new input patterns that are not a mixture of some of the input training patterns cannot be 
classified in an efficient way in terms of the learnt patterns. In particular, for a feed forward neural 
network with one hidden layer, the generalization error (that is, the error in the classification of 
patterns which are very different from the patterns used in the training) is of the order O (iVpar/A'dat) 
|107| . where iVpar is the number of neural network parameters and iVdat the number of training 
patterns. 

Another interesting property of artificial neural networks is that they are efficient in combining in 
an optimal way experimental information from different measurements of the same quantity. That 
is, when the separation of data points in the space of possible input patterns is smaller than a 
certain correlation length, the neural network manages to combine the corresponding experimental 
information, thus leading to a determination of the output pattern (in the present case the function 
to parametrize) which is more accurate than the individual input patterns (the measurements from 
the different experiments). 

5.2.2 Training strategies 

Training a neural network implies the minimization of a suitable statistical estimator. In this Section 
we discuss the different estimators that are minimized in the neural network training. In the next 
Section we will analyze the minimization algorithms that are used to implement this training. 
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For this thesis three different estimators have been used: the central values error function 



=1 

the diagonal statistical error function, 



iat - T 



where at,i is the total uncorrelated uncertainty, Eq. 15.41 and finahy the error taking into account 
correlated systematics, 



= |^^(not)(fe) _ ^(art)(fe)^ / j |^^(not)(fc) _ ^(art)(fc) 



(5.34) 



where in all the above expressions jg prediction of the k-th neural network for the i-th 

data point. Note that in the above equation we have defined the covariance matrix cov'^'^-'y with 
the normalization uncertainty included as an overall rescaling of the error due to the normalization 
offset of that replica, namely. 



ith 



^ ^ fJ^ sys,pif^^ sys,p_7 1 "t" ^ij^^ ^ t,i ; (5.35) 



^al = a + r^N<^N)aa., , (5.36) 



where rj^ is the same as in Eq. 15.31 Eq. 15.351 is to be compared with the total covariance matrix, 
Eq. 15.61 This is necessary in order to avoid a biased treatment of the normalization errors |98j . In 
Appendix I A . 31 we describe within an explicitely solvable model the effects of an incorrect treatment 
of normalization uncertainties. 

Several relations hold between these estimators. Properties of minimization using the error 
function defined with the covariance matrix, Eq. 15.341 can be found in Refs. |98II108) . For example, 
it can be shown that the following relation holds 

) > E^'"^ , (5.37) 

since in the latter case correlated systematic uncertainties are included. The above constraint is 
useful to cross-check that the experimental covariance matrix Eq. 15. 351 has been correctly computed 
and inverted, without numerical instabilities. 

The usefulness of neural networks is due to the availability of a training algorithm. This algorithm 
allows one to select the values of weights and thresholds such that the neural network reproduces 
a given set of input-output data, also known as patterns. This procedure is called learning since 
unlike a standard fitting procedure, there is no need to know in advance the underlying rule which 
describes the data. In our case the patterns that must be learned are those that minimize any of the 
estimators described above. Therefore we need to define a suitable minimization strategy to train 
the neural networks. During the course of this thesis we have used different minimization algorithms, 
which are described in the following Section. 



The neural network approach to parton distributions 



64 



5.2.3 Minimization algorithms 

Three different minimization algorithms have been used during the course of this thesis: back- 
propagation, genetic algorithms and conjugate gradient, which we now discuss in turn. 

• Back-propagation. 

Back-propagation for neural network training was throughly described in Ref. (llj and we 
summarize its main features here. Back-propagation is a minimization strategy specially suited 
for the training of neural networks with diagonal error functions. Let us assume that the error 
function used in the minimization is diagonal and a neural network with a single output layer, 
then one has 



Wdat 2 

fc=i 

where i^^'^"*^ is the output of the neural network, pj,'^^^'' — ^[^^ when the input of the neural 

network is ^l^"* = xik- If the error function is non-diagonal, back-propagation still can be used 
but the corresponding expressions become more complicated. 

The error function is a function of the weights and the thresholds of the neural network, 
and can be minimized by looking for the direction of steepest descent in the space of weights 
and thresholds, and modifying the parameters in that direction. 



sor = -V- 



where rj is the so-called learning rate. The non-trivial result on which back-propagation training 
is based is that it can be shown that the steepest descent direction is given by the following 
recursive expression: 

Wdat 

5J^ = E Af ^ = l,...,nu j = 1, . . . , , (5.40) 

fc=i 

A'dat 

5ef>=-Y,l^\ .-l,...,n, , (5.41) 

k=l 

where the information on the goodness of the fit is fed to the last layer through 

A*^'' = g' (/if [i^^"^'^ - ^^^'^'^P^j , (5.42) 

where the h function is defined in Eq. 15.291 and then back-propagated to the rest of the 
network by 

"1 

Af-i)'==g'(#)^-)X:Af'=c.(j) . (5.43) 

The procedure is iterated until a suitable converged criterion is satisfied, as will be described 
in brief. To avoid getting stuck in local minima, it is useful to add a momentum term in the 
variations of the parameters. This means that Eq. 15.391 should be replaced by 

5u;«=-7y|^+a<54^(last) , (5.44) 
<50« (last) , 
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where (last) indicates the use of the last value of the update for the weights and the thresholds 
before the current one and a determines the size of the momentum term. The drawbacks of 
back-propagation minimization is that it is not specially suited for non-linear error functions, 
or for error functions which depend on the neural network output through convolutions. 

One major improvement of this minimization algorithm during the course of the present thesis 
was to implement weighted training, in order in increase the efficiency of the algorithm when 
the experimental data consist on many different experiments. Since weighted training does 
not depend on the minimization algorithm, we discuss it later in this Section. 

• Genetic Algorithms. 

Genetic algorithms^ are the generic name of function optimization algorithms that do not suffer 
of the drawbacks that deterministic minimization strategies^ have when applied to problems 
with a large parameter space. Genetic algorithms for neural network training were introduced 
in this context in and it is a minimization strategy that has been used in different high 
energy physics applications |109j . This method is specially suitable for finding the global 
minima of highly nonlinear problems, as is the one we are facing in this thesis. Genetic 
algorithms have several advantages with respect to deterministic minimization methods: 

1. They simultaneously work on populations of solutions, rather than tracing the progress 
of one point through the parameter space. This gives them the advantage of checking 
many regions of the parameter space at the same time, lowering the possibility that a 
global minimum gets missed. 

2. No outside knowledge such as local gradient of the minimized function is required. 

3. They have a built-in mix of stochastic elements applied under deterministic rules, which 
improves their behavior in problems with many local extrema, without the serious per- 
formance loss that a purely random search would bring. 

All the power of genetic algorithms lies in the repeated application of three basic operations: 
mutation, crossover and selection, which we describe in the following. The first step is to 
encode the information of the parameter space of the function we want to minimize into an 
ordered chain, called chromosome. If A^par is the size of the parameter space, then a point in 
this parameter space will be represented by a chromosome. 



In our case each bit ai of a chromosome corresponds to either a weight or a threshold 

^ of a neural network. Once we have the parameters of the neural network written as a 
chromosome, we replicate that chain until we have a number Atot of chromosomes. Each 
chromosome has an associated fitness /, which is a measure of how close it is to the best 
possible chromosome (the solution of the minimization problem under consideration). In our 
case, the fitness of a chromosome is given by the inverse of the function to minimize 



so chromosomes with larger fitness correspond to those with smaller value of the function to 
minimize x^- 

Then we apply the three basic operations: 

^See for example Refs. I109linfll0l and references therein. In particular we follow closely the description of genetic 
algorithms of Ref. IllOI . 

■^The canonical example of a deterministic minimization algorithm is the MINUIT routine lllll from the CERN 
software libraries 




(5.45) 
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— Mutation: 

Select randomly a bit (an element of the chromosome) and mutate it. The size of the 
mutation is called mutation rate ry, and if the k-th bit has been selected, the mutation is 
implemented as 

Ofc ^ flfc +?? (^r - , (5.47) 

where r is a uniform random number between and 1. Over the span of several genera- 
tions, even a stagnated chromosome position can become reactivated by mutation. The 
optimal size of the mutation rate must be determined for each particular problem, or it 
can be adjusted dynamically as a function of the number of iterations. 

— Crossover: 

This operation helps in obtaining a child generation which is genetically different from the 
parents. Crossover means selecting at random pairs of individuals, for each pair determine 
randomly a crossover bit s, and from this crossover point interchange the bits between 
the two individuals. 

— Selection: 

Once mutations and crossover have been performed into the population of individuals 
characterized by chromosomes, the selection operation ensures that individuals with best 
fitness propagate into the next generation of genetic algorithms. Several selection opera- 
tors can be used. The simplest method is to select simply the iVchain chromosomes, out of 
the total population of A'tot individuals, with best fitness. Later we will describe a more 
efficient selection method based on probabilistic selection. 

The procedure is repeated iteratively until a suitable convergence criterion is satisfied. Each 
iteration of the procedure is called a generation. A general feature of genetic algorithms is 
that the fitness approaches the optimal value within a relatively small number of generations. 

The theoretical concept behind the success of genetic algorithms is the concept of patterns or 
schemata within the chromosomes |112) . Rather than operating only with iVtot individuals in 
each generation, a genetic algorithm works with a much higher number of schemata that partly 
match the actual chromosomes. If for simplicity we consider chromosomes whose elements 
can take only binary values, the concept of schemata means that a chromosome like 10110 
matches 2^ schemata, such as or 1*1*0. The generalization of this concept to continuous 

parameters is straightforward. Since fit chromosomes are handled down to the next generation 
more often than unfit ones, the number of copies of a certain schema S associated with fit 
chromosomes will increase from one generation to the next, 

ns{t + l)=ns{t)^ , (5.48) 

/tot 

where f{S) is the average fitness of all individuals whose chromosome match schema S and 
/tot is the average fitness of all individuals. This implies that if we assume a certain schema 
approximately giving all matching chromosomes a constant fitness advantage c over the average 

f{S) = (1 + c) /tot , (5.49) 

one obtains an exponential growth of the number of this schema from one generation to the 
next, 

ns{t)=ns{0)a + cY ■ (5.50) 



The above derivation is a rough approximation to the behavior in realistic cases, and it needs 
to be corrected for effects like mutation. To this purpose we define two measures on schemata: 
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1. The defining length S is the distance between the furthest two fixed positions. In the 
above example, the defining length in 1*1*0 is (5 = 4. 

2. The order o of a shema is the number of fixed positions it contains. In the above example, 
= 3. 

With these measures, if L is the length of a chromosome (that is, the number of parameters 
-^par of the function we are minimizing) the following bound can be derived 



/tot 



1 - -o{S)p„ 



(5.51) 



The first correction includes the effect of crossover and the second implements the effects of 
mutations, since in a schema of order o, there is a probability (1 — Pm)° (1 — opm) that the 
schema survives mutation, where Pm = 1 /-^par is the probability to select a bit at random out 
out the total TVpar bits of a chromosome chain. A consequence of Eq. I5.51l is that short, low- 
order schemata of high fitness are the building blocks towards a solution of the problem. During 
a run of the genetic algorithms, the selection operator ensures that building blocks associated 
with fitter individuals propagate through the population. It can be shown |112j that in a 
population of size A'tot, approximately 0{N^^^) schemata are processed in each generation. 

The basic genetic algorithms that we have introduced above can be extended in many ways to 
address specific problems. Several improvements over this basic version of the algorithm have 
been implemented in the course of this thesis. The first one is the introduction of multiple 
mutations Ny^ut > 2. This is helpful to avoid local minima, thereby increasing the speed of 
training. It is crucial that rates for these additional mutations are large, in order to allow for 
jumps from a local minimum to a deeper one. That is, after the first mutation is performed, 
additional mutations are performed 

ai = ai + Vp (^r - ^ , p = 2, . . . , iV,„ut , (5.52) 

with mutation rate rjp and probability Pp. Each time a new mutation is performed, a different 
bit ai of the chromosome chain is selected at random. 

Second, we have introduced a probabilistic selection for the mutated chromosome chains, in- 
stead of the basic deterministic selection. This is helpful in allowing for mutations which only 
become beneficial after a combination of several individual mutations, and allows for a more 
efficient exploration of the whole space of minima. Once one has the mutated population of 
iVtot individuals, instead of selecting the iVchain individuals with smallest x^j one selects first 
the chromosome with smallest x^j with value (^i)i ^^'^ then selects the remaining iVchain 
individuals according to the probability 

D^2. /-(X^(a0-X^(ai))\ . 

Pt [X (a<)) = exp I ^ I , 1 = 2,..., A^tot , (5.53) 

where T is the temperature of the system, in analogy with the Metropolis algorithm in Monte 
Carlo simulations |113j . At high temperatures even the chromosomes with bad fitness have 
some probability to propagate to the next generation, while at low temperature one recovers 
the deterministic selection rule. With this selection criteria, individuals with accidentally bad 
fitness but with relevant schemata can propagate to the next generations. Of course, after 
a number of generations only the individuals with good fitness will propagate through the 
generations. 



• Conjugate Gradient 
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Conjugate gradient minimization exploits in an efRcient way the information on both the 
function to minimize {^) and its gradient, Vx^ (a), where as in Eq. 15.451 a stands for the 
set of weights and thresholds that characterize the neural networks. Starting with an arbitrary 
initial vector, go and letting hp = go, the conjugate gradient method constructs two sequences 
of vectors from the recurrence 

gi+i = gi - Ai A • hi, hi+i = gi+i + 7ihi, 1 = 0,1,2,... (5.54) 

where we have assumed that the function to be minimized can be approximated by a quadratic 
form 

X^(a)-C-B-a+ia-A-a + C'(a3) , (5.55) 
and the scalars are defined as 

7. = i^+ilto, (5.57) 

and the dimension of each of the above vectors is A'par, the size of the parameter space. With 
these definitions the following orthogonality and conjugacy conditions hold, 

gj • gj = 0, hj • A • hj = 0, g, • = 0, j <i ■ (5.58) 

If one knew the Hessian matrix A in Eq. 15.561 the conjugate gradient method would take us 
to the minimum of (a), but in general this Hessian matrix is not available. However, it can 
be proven the following property: suppose we happen to have gi = — Vx^ (a^) at some point 
ai of the parameter space. Now we proceed from the point a^ along the direction to the 
local minimum of x^ located at the point a^+i and then set g^+i = — Vx^ (a^+i). Then the 
vector gi+i constructed this way is the same that the one that would have been constructed 
from Eq. 15.541 if the Hessian matrix had been known. 

Once we have constructed a set of conjugate directions, the minimization of the function x^ (a) 
is straightforward: starting from an initial point in the parameter space ai, one has to find 
the quantity that minimizes 

X^(afc + afcgfc) (5.59) 



and then one sets 



afc+i = afc + akgk (5.60) 



and this procedure is repeated from fc = 1 to fc = -/Vpar- If the fimction to be minimized 
was a quadratic form, conjugate gradient minimization would find the minimum in a single 
iterations. Obviously, for real functions the length of the minimization is typically larger. 
Conjugate gradient minimization is also a suitable minimization algorithm for nonlinear error 
functions, and the main reason for its efficiency is the optimal use of the information contained 
in the gradient of the function to be minimized Vx^- 

Once we have described the different minimization algorithms that will be used for neural network 
training, we discuss the weighted training procedure. The essential idea about weighted training 
is that during the neural network training some experimental points are given more weight in the 
error function than others. This is useful for example to learn in a more efficient way those data 
points with a smaller error. In weighted training, one separates the iVdat data points into A'sots 
sets, each with NiJ = 1, . . . , A'^gots data points. A typical partition is that each set corresponds 
to each different experiment incorporated in the fit, iVgets = -^exp, but arbitrary partitions can be 
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considered, like different kinematical regions. The cross-correlations between points corresponding 
to different sets are neglected. The weighted error function that is minimized is then 



1 



(5.61) 



1=1 



where zi is the relative weight assigned to each of the sets and xf is the error function, either Eq. 
15.331 or 15.341 for the data points that belong to the 1-th set. There are several ways to select the 
relative weights z; of each experiment. A possible parametrization of the values of zi is 



zi = ai 



Xi 



(5.62) 



where x 



2 

max 



maxilxf} and where a;, 6/ are to be determined in a case-by-case basis. If 6; = the 



relative weights of each set is kept fixed during the training, and if 5/ 7^ the relative weights can 
be dynamically adjusted during the training so that sets with larger have associated a higher 
relative weight zi. Note that the final x^, Eq. 15.761 is the standard unweighted one, and that the 
minimization of Xminim during the neural network training training is only a useful strategy to obtain 
a more even distribution of xf among the different data sets. In Fig. 15.31 we show with an example 
of Section f6.4.2l the parametrization of the nonsinglct parton distribution qNsi^i Qa)i ^cm weighted 
training helps in obtaining more similar values for the of the different experiments incorporated 
in the fit. 
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Figure 5.3: 

A comparison of a training to experimental data with and without weighted training (WT). As can be seen 
in the figure, at the end of the neural network training, the of the two experiments incorporated in the 
fit (BCDMS and NMC, see Section f6.4.21 are more similar in the case of weighted training than in the case 
of unweighted training. 

An important issue to optimize the efficiency of the neural network training is the choice of 
the optimal architecture or topology of the neural network. The choice of the architecture of the 
neural network (the number of layers L and the number of neurons n; in each layer) cannot be 
derived from general rules and must be tailored to each specific problem. An essential condition is 
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that the neural network has to be redundant, that is, it has to have a larger number of parameters 
than the minimum number required to satisfactorily fit the patterns we want to learn, in this case 
experimental data. However, the architecture cannot be arbitrarily large because then the training 
length becomes very large. A suitable criterion to choose the optimal architecture is to select the 
architecture which is next to the first stable architecture, that is, the first architecture that can fit 
the data and gives the same fit than an architecture with one less neuron. This way one is confident 
that the neural network is redundant for the problem under consideration. 

A more systematic approach to determine the optimal architecture of a neural network which 
parametrizes a function F is related to how many terms are needed in an expansion of F in terms 
of the activation function g. Starting from a large neural network, we can reduce the architecture 
up to the optimal size with the weight decay approach, in which weights which are rarely updated 
are allowed to decay according to 

= - ' (5-63) 

where e is the decay parameter, typically a small number. This corresponds to adding an extra 
complexity term to the error function |114| . 



X^-X^ + ^E-r> (5-64) 



that is, larger weights lead to larger contributions in the error function. A more advanced complexity 
terms is 



(02/, ,2 



X^^X^ + A^^^, (5.65) 

which again penalizes those weights whose absolute value is larger. With this technique, the net- 
work gets forced to only contain the weights that are really needed to represent the problem under 
consideration. 

For each training, the parameters that define the behavior of the neural networks, its weights 
and thresholds, are initialized at random. To explore in a more efficient way the space of minima, 
it has been checked to be specially useful to initialize randomly not only the values of the neural 
network parameters but also the same range in which these parameters are initialized. That is, the 
parameter initialization range is determined at random between 

[-{lj) -a^,{uj)+a^] , (5.66) 

and 

[-{uj) +a^,{Lj) -a^] , (5.67) 

where {to) is the average value of the neural network parameters and a^^ is the associated variance. 

It has to be taken into account that a minimization strategy is defined not only by an algorithm 
to vary parameters so that a given quantity is minimized, but also by a convergence condition 
that determines when the minimization is stopped. Several stopping criteria, each with its own 
advantages and drawbacks, have been considered during the course of this thesis. 

A first criterion is the dynamical stopping of the training. With this criterion, for each replica, 
we stop the training either when the condition E^''^ < Xg^op is satisfied or, if the previous condition 
cannot the fulfilled, when the maximum number of iterations of the minimization algorithm iVmax 
is reached. That is, the length of the neural network training is different for each replica. The 
value of the xltop parameter is determined by the value of the total x^, Eq- 15.761 that defines the 
parametrization. The dynamical stopping of the training allows to avoid both overlearning and 
insufficient training, as discussed in detail below. 
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The method to define the optimal value for iV,„ax is the following: first perform a training with 
a very large value for Na^a.^- For iVi-op replicas, the condition E'^'^^ < Xstop '^i^l i^ot be satisfied due 

(k) 

to statistical fluctuations in the generation of the data replicas. If E}^' is the final value of the 
error function at the end of the training of the k-th replica, which has not reached the dynamical 
stopping criterion, then determine for each replica which was the iteration it for which the condition 

E^'l <{l + r^^) , A^f ^ < iV„,ax , (5.68) 

it 

was verified, were is the required tolerance. The new and optimal value for A'max is determined 
then as the average value of such iterations. 

7V,„a, = J2Ni^\ (5.69) 
^^'■<=p k=i 

where the sum is over those replicas which have not been dynamically stopped. With dynamical 
stopping of the training, the final distribution of error functions i?*-'^-' is even and there is no problem 
of overlearning, as will be discussed in brief. 

An alternative criterion to stop the neural network minimization is fixing the maximum number 
of iterations of the minimization algorithm to a value large enough so that once is sure that within 
a given tolerance the fit has converged. However, this approach does not take into account that the 
length of the training of each replica depends on each precise replica, so it is not computationally 
efficient and might lead to overlearning. 

The rigorous statistical method to determine the value of Xg^op with dynamical stopping of the 
training is the so called overlearning criterion. Since the number of parameters of the neural network 
parametrization is large, in principle, without the presence of inconsistent data, the final could 
be lowered to arbitrarily low values for a large enough neural network. The overlearning criterion 
to determine the length of the training states that the training should be stopped when the neural 
network begins to overlearn, that is, it begins to follow the statistical fluctuations of the experimental 
data rather than the underlying law. The onset of overlearning can be determined by separating the 
data set into two disjoint sets, called the training set and the validation set. One then minimizes 
the error function, Eq. 15.341 computed only with the data points of the training set, and analyzes 
the dependence of the error function of the validation set as a function of the number of iterations. 

Then one computes the total x^j Eq. 15.761 for both the training, Xtn ^^'^ validation, Xvaii 
subsets. It might turn out that statistical fluctuations are large, and one has to average over a large 
enough number of partitions to obtain stable results. The onset of overlearning is determined as the 
point in the neural network training such that the Xvai of the validation set saturates or even rises 
while the Xtr oi the training set is still decreasing. This implies that the neural network is learning 
only statistical fluctuations, and signals the point where the training should be stopped. 

A drawback of this approach is that one needs to assume that the training subset reproduces 
the main features of the full set of data points. While this is the case in global fits of parton 
distributions, where in each kinematical region there are several overlapping measurements, in other 
physical situations experimental data is much more scarce and the overlearning criterion cannot be 
used to determine the length of the training. 

In Fig. 15.41 we show the expected behavior for the onset of overlearning. One can see that 
while the x^ of the validation set begins to increase, the x^ of the training set is still decreasing. 
This point signals the onset of overlearning, that is, the fact that the neural network is learning 
statistical fluctuations rather than the underlying law in the experimental data. Note also that for 
some problems, like for example those in which the data points are very dense, overlearning could 
not be possible. 

An alternative criterion to determine the optimal x^ which defines the neural network training 
is the so-called leave-one-out strategy |115| . In this strategy, out of the total iVdat data points. 
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Figure 5.4: 

An example of a neural network learning process with overlearning. The point where the a-s computed 
with the validation set begins to rise while the of the training set is still decreasing is the sign of this 
overlearning. 

one selects one data point at random and leaves it out of the training of all the remaining points. 
One computes then the prediction of the neural network for this point that has been left out. This 
procedure is repeated over all the data points, and the total as computed with the predictions 
for those points that have been left out is the average for a pure neural network prediction for a 
point in the same range. If the underlying law followed by the data points is clear, then the of a 
prediction can be as good as the total x^ of the trained data points, and allows to determine which 
is the optimal value of the total x^ to aim for in the neural network training. The conditions for the 
leave-one-out strategy can be modified to take into account different types of correlations between 
the data points, for example choosing sets of independent points at random instead of single data 
points. 

5.2.4 Implementation of theoretical constraints 

In addition to experimental measurements, in general we have additional theoretical information 
concerning the function we want to parametrize, like for example kinematical constraints. Since this 
theoretical information is usually essential to determine the shape of the function that is going to be 
parametrized, it must be incorporated in our fit. In this section we discuss the different approaches 
to incorporate theoretical information in our neural network fits. These theoretical constraints are 
specially useful in general to constrain the shape of the parametrization in regions where there is no 
experimental data. The methods used in the present thesis have been the following: 

1. The Lagrange multipliers method: 

If the function to be parametrized, F has to verify a set of A^con theoretical constraints J-i [Fi] = 
j-{thco) ^ one can use the Lagrange multiplier technique, which consist on minimizing for each 
replica the standard error function while constraining the function to satisfy the theoretical 
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requirements, 



where Xdat contribution from the experimental errors, Eqns. l5.33l or f5.34l and where the 

Lagrange multipHers Xi are computed via a stabihty analysis. They have to be such that the 
constraints are imposed with the required accuracy but at the same time the space of minima 
of Xtot ^ has to be close to that of the data error function, Xdit'' ■ 

2. Pseudo-data: 

If the constraints to be imposed of the function F are local, they can be implemented as if 
they were data points. That is, if the theoretical constraint is 



2(fc) _ 2{k) 
Xtot Xdat 



A'con 

E 

i=l 



A,; UF,. 



p{nct){k) 


^(thco) 







(5.70) 



F{xi,X2, 

then the minimized quantity should be 



•) 



(5.71) 



2(fc) 
Xtot 



2{k) 
Xdat 



Wcon 

E 



1 



^(nct)(fc) ^^^^ 



X2, 



(5.72) 



where aj is the accuracy with which the corresponding constraint must be satisfied. This 
approach differs from the Lagrange multiplier approach in that the accuracy to which the 
constraints are to be satisfied is fixed by several requirements, for example, by requiring the 
error ai to be a fixed fraction of the average error of the experimental data points. Note that 
this is equivalent to adding to the experimental data set artificial data points with central 
value F {xi, X2, . . . , xj , . . .) — bj and total uncorrelated error aj. The addition of artificial 
data points is helpful to constrain the shape of the parametrized function in regions without 
experimental data, for example near kinematical thresholds. 

Hard-wired parametrization: 

Another method is to hard-wire the constraint in the neural network parametrization. This 
method is specially useful to implement the vanishing of the function for some kinematical 
region. If we have 

F{xi,X2,.-.,Xj,...) ^0 , (5.73) 
then we redefine the function to be parametrized to be 



F("<=*)W {XI,X2, 



n 



_ sr. 
Xj) 



i ^(nct)(fe) 

r [xi,X2,. 



■) 



(5.74) 



where now is the function the one that is going to be parametrized with a neural 

network. The introduction of a partial functional form dependence in the parametrization 
does not mean that there is any functional form bias, since one can check the the results of the 
fit do not depend on the values of rij, provided they verify Uj > 0. The default values of the 
Uj parameters have to be determined from a stability analysis to assess which is the optimal 
neural network training preprocessing. 

All of the above techniques can be combined for a given parametrization. Using any of these 
techniques, it has to be shown that in each case the result of the fit is modified as expected by the 
implementation of the kinematical constraints. For example, the dominant contribution to Xtot has 
to be always that of the experimental data points. 
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5.3 Validation of the results 

The third step of our strategy consists on the vahdation of the resuhs of the neural network training 
using suitable statistical estimators. These results are defined by the sample of trained neural 
networks which constitutes the sought-for probability measure for the observable F, V [F], from 
which expectation values for an arbitrary functional of it, can be computed using 

1 

{T[F]) = / VFV [F] T[F] = —— J2 , (5.75) 

just as with standard probability measures. We define now such estimators, that can be divided 
into two categories: statistical estimators to assess how well the sample of trained neural networks 
reproduce the features of the experimental data, and statistical estimators which assess the stability 
of a given fit with respect some parameters, for example the number of trained neural networks. 



5.3.1 Statistical estimators: probability measure vs. experimental data 

Now we describe the statistical estimators that we use to assess the quality of the constructed 
probability measure in the space of the observable F by comparing with the experimental data. The 
goodness of the final fit is measured with the total x^) as constructed from the full experimental 
covariance matrix 




(5.76) 



where now the total experimental covariance matrix Eq. 15.61 includes the contribution from the 
normalization uncertainties. If experimental errors were correctly estimated and there were no 
incompatibilities between measurements, on statistical grounds one would expect ~ 1, with 
deviations from this value scaling with the number of data points as l/\/Ndat- To be precise, 
instead of TVdat one should have the number of degrees of freedom A'^dof , the difference between the 
number of data points and the number of free parameters of the theoretical model A^par, and the 
standard deviation of the distribution is given by a^2 — ^2/iVdof. 
Another important estimator is the average error, 

^^^<=P fc=i 

where E^'^^ is either Eq. l5.33l or Eq. 15.341 depending on the estimator which has been used in the 
neural network training. It is instructive to estimate the typical values that the average error Eq. 
15 .771 can take. We will compute now Eq. 15. 771 in a simplified model, in which correlated systematics 
are neglected. Let us consider a set of measurement mi of F. Then if one assumes that experimental 
measurements are distributed gaussianly around the true value ti of the observable, one has 

mi = + aiSi , (5.78) 

where at is the total error and Si a univariate zero mean gaussian number. The k-th replica of 
generated artificial data g^'^-' is as in Eq. 15. 31 without correlated uncertainties, 

) = m, + rf V, - + (s, + rf V* , (5-79) 

where r*^ is another univariate zero mean gaussian random number. Now let us assume that the 
best fit neural networks are distributed around the true values of the observable F with error ai. 
For the k-th neural network n^'^-' we will have 

nf) =t, + /f)a, , (5.80) 
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(k) (k) 

where are highly correlated to both rl and ti. For a large enough set of measurements, and a 

(k) 

large enough number of generated replicas, so that correlations between s; and r - can be neglected, 
it can be seen that the average error is given by. 



(^>rcp 



(5.81) 



dat 



where the error function is the diagonal error, Eq. 15.331 Note also that within the same model, the 
error function of the k-th net as compared to the experimental measurement, defined as 



E. 



(k) 



1 



A^dat 
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(5.82) 



has as average 



E) ^ 1 ■ 

rep 



> 1 



(5.83) 



dat 



and finally in the case that the neural network prediction coincides with the true value of the 



function, (Ji = 0, the average error is (E) =1, just as expected from textbook statistics. 

\ / rep 

Now we define the estimators which are used to assess how well the sample of trained neural 
networks reproduce the sample of experimental data. These estimators are 

• Average for each experimental point 
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^ ^ ^(not)(fe) 
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(5.84) 



• Associated variance 
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Associated covariance 



(not) 
Fij 



p(net) p(nct) 
i j 



F, 



(net) 



F. 



(net) 



(net) (net) 



(5.86) 



(net) (net) (net) (net) 

COV-, = 'a- 'a) 



'I] ^ i^ij "I "] ■ (5.87) 
As in the case of the artificial replicas, the three above quantities provide the estimators of the 
experimental central values, errors and correlations as computed from the sample of trained 
neural networks. 



• Mean variance and percentage error on central values over the A^dat data points. 
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We define analogously the mean variance and the percentage error for errors, correlations and 
covariances. 
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Scatter correlation 
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^(cxp) (net) 
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, (5.90) 



where the scatter variance CTs""*'' associated to the neural network prediction is defined as 



We define analogously r[cr("''*^], r[p("''*^] and r[cov("°*^]. 
• 7?.-ratio 

where (E) is given by Eg. 15.771 with covariance matrix error, and for E^''^ one uses the analog 
of Eq. 15.821 including correlated errors, 



E 



-— (5.93) 

-'''rep 



fc=l 

^(fe) ^J-J2 (i^^""*^^'-^ - Ff^) cov^^K/ (^pi--m) _ ^(-P)) . (5.94) 

The 7?.— ratio is interesting since it provides an estimator which is capable to determine whether 
or not when error reduction is observed it is due to the fact that the neural network by 
combining data points has found the underlying law or whether in the other hand it is due to 
artificial smoothing. To verify this property, assume that correlated systematic uncertainties 
can be neglected, as we have done in the example above, then the estimator reads 



1 - 

SO that if error reduction is due to the fact that the neural network has found the underlying 
law that describes the experimental data, one will have <C cr and therefore the 7?.-ratio will 
satisfy 7e ~ 1/2. 

Note that even if the covariance matrix is determined from the correlation matrix and the total 
error from Eq. 15.71 the reproduction of covariances by the probability measure has to be validated 
independently of correlations and error. This is so because even if it is clear that when correlations 
and errors are correctly reproduced, also covariances will be reproduced, the inverse is not necessarily 
true. 



5.3.2 Statistical estimators: stability of the probability measure 

Let us define the statistical estimators that are used to analyze the compatibility and stability of 
the constructed probability measure with respect of some of the parameters that define the fit. 
These estimators are also useful in order to perform a quantitative comparison between fits, that is, 
between probability measures. For example, we know how the different estimators depend on the 
size of the sample iVrcp in the replica generation, but we do not know if the same behavior holds 
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for the sample of trained networks. The estimators in the above section were used to compare the 
results of either the replica generation or the neural network fit to experimental data. In this section 
we want to define statistical estimators which allow us to compare a given fit with another fit. Since 
in practical applications we will have a reference fit, and compare other fits with it, let us label the 
first fit as the reference fit and is denoted by the superscript (ref) and the second fit as the current 
fit, denoted by the superscript (fit). 

These estimators are divided in estimators for central values, standard deviations and correla- 
tions. The estimators are computed for a set of TV^at points which need not to be the same points 
where there is experimental data. In particular one can compute these estimators in the extrapola- 
tion region, to check the stability of the fit also in the region where there is no experimental data. 
These estimators are given by: 

• Central values: 

— Relative error: 
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— Scatter correlation: 
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• Standard deviations: 
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• Correlations: 
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Scatter correlation: 



r [p] = '"^""^ ^ ^^^^^ ^^^^ . (5.101) 



were averages and variances of central values, errors and correlations can be computed using the 
expressions in Appendix lA.ll Two probability measures are said to be equivalent to each other if 
the conditions (RE[i^])j^j < 1, (RE [cr^])^^^^ < 1 and {RE[p])^^^ < 1 are fulfilled. For the scatter 
correlations one expects for two compatible probability measures the condition r < 1 to hold. Note 

that the iVdat points where the stability estimators defined above are computed do not need to be 
those were there is experimental data, for instance, these estimators can be used to compute the 
stability of a probability measure also in the extrapolation region. Since fluctuations might be large, 
it might be needed to average over different partitions of the sets of trained neural networks to 
obtain stable results. 

Another relevant estimator is the so called pull. The average pull of two different fits is defined 

as 

For two fits to be consistent within the respective uncertainties the condition P,; < 1 should be 
satisfied. Note that the condition < 1 is necessary for two fits to be compatible within errors, but 
it is not sufficient for two fits to define the same probability measure, since in particular, if either 
(j(fit) or (t''°^' is much larger than the other error, then the two fits will be very different yet the pull 
will still satisfy Pi < 1. 

Finally, there are other conditions that must be verified for the sample of trained neural networks 
for a given fit to be considered as satisfactory. These conditions are related to the criteria used to 
stop the minimization of the individual replicas. If we use dynamical stopping of the training, as 
described in Section Hi. 2. 21 the distribution of errors E'^'^' at the end of the training over the trained 
replica sample must be peaked around the average result (E) , Eq. 15.771 because the opposite case 
would mean that the averaged result is obtained combining good fits with bad fits (in the sense of 
fits with large values of E^'^^). Another estimator of the consistency of the results is the distribution 
of training length, where the training length measures the length of the minimization procedure, 
cannot be peaked near A'^ax, the maximum number of iterations, because if it is this means that 
the dynamical stopping of the minimization is not being effective, and one has instead fixed training 
length minimization. In Section 16.31 we show how these two conditions are satisfied in a particular 
example. 
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The neural network approach: 
Applications 

In this Chapter we review four apphcations of the general strategy to parametrize experimental 
data that has been described in the previous Chapter. First we describe a parametrization of the 
vector-axial vector spectral function pv-a{s) from the hadronic decays of the tau lepton which 
incorporates theoretical information in the form of sum rules, with the motivation of extracting 
from this parametrization the values of the nonperturbative vacuum condensates. Then we present 
a parametrization of the proton structure function (x, Q^), which updates the results of Ref. [TT] 
by including the data from the HERA experiments. This parametrization is not only interesting as a 
new application of the general technique, but also it allows us to develop the necessary techniques to 
apply our general strategy to problems with data coming from many different experiments. The third 
application is a parametrization of the lepton energy distribution dT{Ei) / dEi in the semileptonic 
decays of the B meson. As a byproduct of this parametrization we will provide a determination of 
the b quark mass rrif, (nib)- 

We devote special attention to the last and most important application, which is the main motiva- 
tion for the development of the techniques described in Chapter the neural network parametriza- 
tion of parton distributions. To this purpose a new strategy to solve the DGLAP evolution equations 
is introduced. We then review the parametrization of the non-singlet parton distribution qNsi^i Qo) 
from experimental data on the non-singiet structure function F.^^ {x, Q^), and we devote special at- 
tention to the comparison of our results with those of the standard approach to parton distributions 
described in Section 

6.1 Spectral functions in hadronic tau decays 

In this Section, we describe the first application of the technique introduced in Chapter [S] to 
parametrize experimental data that was implemented during the present thesis. We construct a 
parametrization of the vector minus axial- vector spectral function py_yi(s) from the hadronic de- 
cays of the tau lepton. As has been described in Section [4. 1.41 this spectral function is determined 
from purely nonperturbative dynamics, so it is specially suited for the determination of non perturba- 
tive parameters like the chiral vacuum condensates. The motivation to determine a parametrization 
of the spectral function pv-a{s) is thus to provide an extraction of the QCD vacuum condensates 
(Oe), (Os) and higher dimensional condensates from experimental data. A more detailed description 
of this work can be found in Ref. T . 

Following the strategy described in the previous Chapter, the parametrization of the spectral 
function pv~a{s) will combine all available experimental information, that is, central values, errors 
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and correlations, together with information from theoretical constraints: the chiral sum rules, Eqns. 
14. 6214 .631 and the asymptotic vanishing of the spectral function in the perturbative s oo limit, 
Eq. 14.561 This way we will obtain a determination of the QCD vacuum condensates which is 
unbiased with respect to the parametrization of the spectral function and with faithful estimation 
and propagation of the experimental uncertainties. All sources of uncertainty related to the method 
of analysis are kept under control, and their contribution to the total error in the extraction of the 
condensates is estimated, as discussed in 

Since the relevant spectral function for the determination of the condensates is the pv-Ais) 
spectral function, we need a simultaneous measurement of the vector and axial-vector spectral 
functions from the hadronic decays of the tau lepton. Data from the ALEPH Collaboration jll6| 
and from the OPAL Collaboration 1117' will be used^ , which provide a simultaneous determination 
of the vector and axial vector spectral functions in the same kinematic region and also provide the 
full set of correlated uncertainties for these measurements. There exists additional data on spectral 
functions coming from electron-positron annihilation, but their quality is lower than the data from 
hadronic tau decays and will not be incorporated to the present analysis. 

Experimental data does not consist on the spectral function directly, but rather on the invariant 
mass-squared spectra for both the vector and axial- vector components, that are related to the 
spectral functions by a kinematic factor and a branching ratio normalization. 



PV/Ais) 



M2 B{t- ^V-/A-i^,) 1 dNv/Ais) 



In the above equation. 



e Ve^r) Ny/A ds 

1 dNv/A{s) 



s 



2s 



V/A 



ds 



s < Af2 
(6.1) 

(6.2) 



is the normalized invariant mass distribution, is the tau lepton mass, s is the invariant mass of 
the hadronic final state and Sew s-re the electroweak radiative corrections. 
In the following Py^\ j will denote the i-th data point. 



„(oxp) 
Pv-A.i 



PV-Aisi) = Pv(Sj) - pAisi) 



1, 



dat 



(6.3) 



where N^^t is the number of available data points. Fig. 16. II shows the experimental data used in the 
present analysis from the two LEP experiments. Note that errors are small in the low and middle 
s regions and that they become larger as we approach the tau mass threshold. The last points are 
almost zero in the invariant mass spectrum, since near threshold the reduced phase space implies a 
lack of statistics, and are only enhanced in the spectral functions due to the large kinematic factor 
for s near (the last term in Eq. 16.1(1 , so special care must be taken with the physical significance 
of these points. 

It is clear that the asymptotic vanishing of the spectral function. 



lim pv~a{s) -> 



(6.4) 



implied by perturbative QCD is not reached for s < AI^, at least for the central experimental 
values, and therefore should be enforced artificially on the parametrization of the Pv~a{s) spectral 
function that we will construct. The method that we use takes advantage of the smooth, unbiased 
interpolation capability of neural networks: artificial points are added to the data set with adjusted 

^In this analysis we used the ALEPH data as presented in Ref. 11161 . Recently, the ALEPH collaboration released 
their final results on hadronic spectral functions and branching fractions of the hadronic tau decays 11181 , which have 
reduced uncertainties due to the larger statistics, specially in the large s region. It would be interesting to repeat the 
present analysis with this updated experimental data on the Pv-a{s) spectral function, as has been done with other 
physical applications S5, . 
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Figure 6.1: 

Experimental data for pv-a(s) spectral function from the ALEPH (left) and OPAL (right) collaborations. 
Note that errors increase as we approach the kinematic threshold at s = M^. 



Pv-a{s) 



iVrep 


10 


100 


1000 




0.98 


0.99 


0.99 




0.98 


0.99 


0.99 


j.[p{art)] 


0.61 


0.95 


0.99 



Table 6.1: 

Comparison between experimental data the Monte Carlo sample of artifical replicas for the pv-a(s) spectral 
function. 

errors in a region where s is high enough that the pv-a{s) spectral function should vanish, as 
discussed in Section [^.2. 41 That is, we add to the experimental data a set of artificial points, 

Pv-A{si) =0, i = iVdat + 1, . . . , iVtot , Si>M^, (6.5) 

with error ai , and where A'tot is the total number of points (data and artificial) . Once these artificial 
points are included, the neural network will smoothly interpolate between the real and artificial data 
points, also taking into account the constraints of the sum rules, as explained below. 

As has been described in Chapter El the first step in the parametrization of the spectral function 
Pv-a{s) is the generation of a Monte Carlo sample of replicas of the experimental data. The 
experimental data for the invariant hadronic mass spectrum consist on the central values, the total 
error and the correlations between different invariant mass bins. Therefore, to generate replicas of 
the experimental data, we use Eq. 15.131 which in the present case reads 

P^vV = + ^^'^^tot. , = 1, . . . , 7V,ep , (6.6) 

(k) 

where the gaussian random numbers have the same correlation as experimental data. The 
replica generation has the statistical estimators that can be seen in Table I^TI It can be seen that a 
Monte Carlo sample with iVi-op = 1000 is required to have scatter correlations larger than 0.99 for 
both central values, errors and correlations. 

Once the Monte Carlo sample of replicas has been generated, following the method described 
above, one has to train a neural network in each replica. The estimator which will be minimized in 
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the present case is the diagonal error Eq. 15.331 which reads 



/ (art)(fe) (not)(fc)\ 



2 



=^T. ^ ' 2 ' ' ^ fc=l,...,Wrep , (6.7) 

where Py^^'f^ is the k-th neural network, trained on the k-th replica of the experimental data. Note 
that as explained in correlations are correctly incorporated in the parametrization of pv-a{s) 
through the Monte Carlo pseudo-data generation, Eq. 16.61 As has been described in Section Pf. 1.41 
the parametrization of the pv-a{s) spectral function has to satisfy several theoretical constraints. 
These theoretical constraints, the chiral sum rules, are implemented using the Lagrange Multipliers 
technique, as described in Section [5.2.41 Therefore the total error function to be minimized will be 



1=1 

where for example, for the first theoretical constraint, the DM0 sum rule, Eq. 14.621 one has 

-1 />oo (^ct)(/c)/ \ r / 2\ 

p^^lTi^)] - A / ds ^"^'^ = - Fa , (6.9) 

J 47r ./n s 3 



^1 



and similarly for the remaining chiral sum rules, up to A^con — 4. These sum rules act as constraints 

(k) 

on the neural network output, that is, the main contribution to the error function E^^l (which 

. (k) 

determines the learning of the network) still comes from the experimental errors, that is, the E2 
term, and the sum rules are only relevant in the region where the errors are larger. The relative 
weights of the chiral sum rules Xi are determined according to a stability analysis, as discussed in 

n 

The length of the training is fixed by studying the behavior of the error function E^l^l for the 
neural net fitted to the central experimental values, and asking that E^^l stabilizes to a value close to 
one, which on statistical grounds in the value expected for a correct fit. The minimization algorithm 
that is used for the neural network training is Genetic Algorithms, introduced in Scction r5.2.3l which 
is required since the total error function to be minimized, Eq. 16.81 depends non-linearly with the 
pv-Ais) spectral function through the convolutions of the chiral sum rules. 

With the strategy discussed above, A'rcp neural networks are trained on the A^rop Monte Carlo 
replicas of the experimental data for the spectral function pv-a{s)- As has been described in Chapter 
El once the probability measure in the space of spectral functions V [pv-a] bas been constructed, 
it is crucial to validate the results with suitable statistical estimators. A number of checks is then 
performed in order to be sure that an unbiased representation of the probability density has been 
obtained. The values for the scatter correlations for central values, errors and correlations are 
presented in Table [^7^ It is seen that the central values and the correlations are well reproduced, 
whereas this is not the case for the total errors. The average standard deviation for each data point 
computed from the Monte Carlo sample of nets is substantially smaller than the experimental error. 
This is due to the fact that the network is combining the information from different data points 
by capturing and underlying law. This effect is enhanced by the inclusion of sum rules constraints. 
All networks have to fulfill these constraints which forces the fit to behave smoothly in a region 
where experimental data errors are very large. This should be understood as a success of the fitting 
procedure. 

The constructed probability measure for the pv~a{s) spectral function has built-in the theoretical 
constraints for the chiral sum rules. For example, it can be checked that the two Weinberg chiral sum 
rules are well verified by our neural network parametrization, and thus have been incorporated to the 
information contained on the experimental data. This fact will be crucial because different extraction 
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Pv-a{s) 



N 


10 


100 


r[py_^(s)(-'=t)] 


0.98 


0.98 




-0.21 


-0.20 




0.80 


0.85 



Table 6.2: 

Comparison between experimental data and the averages computed from tire sample of trained neural net- 
works for the pv-a{s) spectral function. 



methods, differing in combinations of these chiral sum rules, can be shown to be equivalent in the 
asymptotic region sq oo. In Fig. IS. 21 the two Weinberg sum rules, Eqs. 14.631 and 14.641 evaluated 
with the neural network parametrization of the spectral function pv'_^(s) are represented. Both 
chiral sum rules are well verified in the asymptotic region, beyond the range of available experimental 
data. This also will ensure the stability of the evaluation of the condensates with respect to the 
specific value of sq chosen as long as it stays in the asymptotic region. 



0.008 
0.006 
0.004 



1.5 2 2.5 3 3.5 

s_0 [GeV"2] 



Figure 6.2: 

The two Weinberg chiral sum rules, Eqs. 14.631 and 14.641 evaluated from the neural network parametrization 
of pv_a(s). Only central values are shown. 

Using the neural network parametrization of the pv-A spectral function, we can compute any 
given sum rule with associated uncertainties. Because the neural parametrization retains all the 
experimental information, we can view values coming from the neural networks as direct experimental 
determinations of convolutions of the spectral function pv-a{s). The value of the condensates 
(Og) , (Os) and higher dimensional condensates are then extracted from the value of an appropriate 
sum rule, to be discussed in brief. The method we will follow is the evaluation of the vacuum 
condensates as a function of the upper limit of integration for each replica and compute the mean 
and standard deviation. As has been explained before, it is crucial to represent the value of the 
different sum rules as a function of the upper limit of integration, to check both its convergence and 
its stability. 

Once the neural network parametrization of the pv-Ais) spectral function has been constructed 
and validated, we can use it to determine the nonperturbative chiral vacuum condensates. These 
condensates can be determined by virtue of the dispersion relation from another sum rule, that is, a 
convolution of the pv-a{s) spectral function with an appropriate weight function. As discussed in 



The neural network approach to parton distributions 



85 



Section [4. 1.41 we define the operator product expansion of the chiral correlator in the following way 



00 cx: ^ 

m')\v-A - E 7p;^T^^2„+4(g^M')(02„+4(M')) ^ E (^2^+4) . (6.10) 



n—l n—1 

The Wilson coefficients, including radiative corrections, are absorbed into the nonperturbative vac- 
uum expectation values, to facilitate comparison with the current literature. As has been explained 
in Sect. 14. 1.41 the analytic structure of the chiral correlator implies that it has to obey the dispersion 
relation, 

Recalling that the imaginary part of the chiral correlator is proportional to the spectral function, 

Imny_^(s) - ^py_^{s) , (6.12) 

comparing terms of the same order in the 1 / expansion in Eq. 16.101 and in Eq. 16.111 it can be 
seen that condensates of arbitrary dimension are given by 

/•so I 

{02n+2) - (-1)" / dss'—pv-Ais) , n > 2 , (6.13) 

which, if the asymptotic regime has reached, should be independent of the upper integration limit 
for large enough sp- 

As long as all previous integrals have to be cut at some finite energy sq < -^^r i since no exper- 
imental information on the pv-a{s) spectral function is available above M^, a truncation of the 
integration should be performed that competes with all other sources of statistical and systematic 
errors, introducing a theoretical bias which is difficult to estimate. Many techniques have been 
developed to deal with this finite energy integrals, leading to the so-called Finite Energy Sum Rules 
(FESR) . A detailed analysis of some alternative techniques and methods to extract the condensates 
can be found in the original work |Tj. 

Using Eq. 16.131 we can extract from our neural network parametrization the values of the 
nonperturbative condensates. To be explicit, one would compute the dimension 6 condensate from 
the sample of trained neural networks in the following way, 

(^a) = ^ E {Of^) = ^ E r dss-^J^}-Hs) . (6.14) 

Stable results are obtained for the dimension six condensate (Og) whereas higher condensates e. g. 
(Cg) are less stable. Fig. 16.31 shows the outcome for (Og) and (Og) including the propagation of 
statistical errors. It is clearly seen that convergence in the limit of integration sq is obtained due 
to the addition of sum rules and endpoints in the learning procedure. The central values for the 
condensates in the asymptotic limit, that is, in the limit in which sq oo, come out to be: 

(Og) = -4.2 10-3 GeV^ , 

(Og) = -12.7 10^3 GeV® . (6.15) 

The value of the (Og) is a cross-check of the validity of our treatment: not only there are strong 
theoretical arguments that support the fact that (Og) is negative 119l ll20j but also all previous de- 
terminations with different techniques yield negative results, being the majority of them compatible 
with ours within errors. 
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Figure 6.3: 

Condensates (Oe) and (Os) as a function of so. The error bands include the propagation of experimental 

uncertainties. 



We note that our evaluation of the condensates is compatible with some of our previous evalua- 
tions and has a similar error. This is though misleading as the error quoted here is only statistical and 
a discussion on systematic errors is needed. The discussion of the various sources of errors is crucial 
to our treatment. The first criterion to judge the reliability of a QCD sum rule is its independence, 
at large values of s from the value of the upper integration limit, that its, its saturation. We then 
need to explore the values for the final condensates which are stable against the limit of integration 
of the sum rule. This stability criterion is completed with demanding independence of the results 
on the specific polynomial entering the sum rule. Further criteria are stability with respect to the 
artificial endpoints added to the data and with respect to the relative weights in the error function 
used to train the neural networks. A detailed analysis of the contributions of the different sources 
of uncertainty to the values of the condensates can be found in 1 . These uncertainties include the 
statistical error propagation from the experimental covariance matrix, which is the best understood 
and treated error source in our analysis, the choice of the finite energy sum rule, the dependence 
on the implementation of the asymptotic vanishing of the spectral functions and the dependence on 
the implementation of the chiral sum rules. For example, in Fig. 16.41 we show that the final values 
of the condensates do not depend on the precise finite energy sum rule used in their extraction. 

Our final determination of the nonperturbative condensates including all relevant sources of 




Figure 6.4: 

Determination of (Oe) and (Og) using different finite energy sum rules, as described in Ref. 



Reference 


(Oe) X 10^ GeV 


(Og) X 10^ GeV« 


Ref. 


128 


-6.4 ± 1.6 


8.7 ±2.4 


Ref. 


129 


-6.8 ±2.1 


7±4 


Ref. 


121 


-3.2 ±2.0 


-12.4±9.0 


Ref. 


130 


-9.5 ±3 


16.2 ± 5 


Ref. 


122 


-4.45 ±0.7 


-6.2 ±3.2 


Ref. 


123 


-4± 1 


-1.2±6 


Ref. J (This work) 


-4 ±2 


-12 + ''' 


Ref. 


131 


-7.2 ± 1.2 


7.8 ±2.5 


Ref. 


126 


-7.9 ± 1.6 


11.7±2.7 


Ref. 


127 


-8.7 ±2.3 


15.6 ±4.0 


Ref. 


125, 


-2.3 ±0.5 


-5 ±3 



Table 6.3: 

Summary of different extractions of the QCD vacuum condensates. Appropriate rescaling have been per- 
formed to allow the comparison of different determinations. Note that some of the above determinations 
appeared after the original publication of Ref. 0. 



uncertainties is 

(Oe) = (-4.0 ± 2.0) IQ-^ GeV^ , 

{Os) = (-12 10-3 GeV« , (6.16) 

(Oio) = (7.8 ±2.4) 10-2 GeV^o , 
(C12) = (-2.6 ±0.8) 10"^ GeV^2 . (6.17) 

The values of the QCD nonperturbative condensates have been previously extracted from the 
experimental data, with different techniques and different results, as summarized in Table IH^ We 
include also the results of works which were published after the original publication of Ref. ^ . Note 
that our results agree, at least on the sign, with that of Refs. [HTl [HI 1123 11111123 • See [HSIITTT) 
for a detailed comparison of the different methods for the extraction of the condensates. 

Since the work presented in Ref. PP was published, there have appeared several studies which 
also determine the values of the higher dimensional condensates from experimental data using a wide 
variety of methods and techniques [HTl [T^ [TTTlin^ [T^ IH^linSl IBl] . from large Nc methods 
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to new sum rule approaches and even a determination inspired in higher dimensional string theories. 
The spread of the results obtained for the higher dimensional vacuum condensates using different 
techniques show that their determination from experimental data is still an open issue. 

Summarizing, in this part of the thesis wc have presented a determination a bias-free neural 
network parametrization of the pv-a{s) spectral function, inferred from the data, which retains all 
the information on experimental errors and correlations, and is supplemented with the additional 
theoretical input of the chiral sum rules. As a byproduct of this analysis, we have performed an 
extraction of the nonperturbative vacuum condensates (Ce) and (Os) aimed at minimizing the 
sources of theoretical bias which might be cause of concern in existing determinations of these 
condensates from spectral functions. Our final results give negative central values for the dimension 
6 and 8 condensates. These results take into account the propagation of statistical errors and their 
correlations. Higher dimension condensates carry larger errors, although the sign of the condensates 
seem to remain unaltered. 
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Figure 6.5: Kinematic range of the available experimental data for the proton structure function -F|'(a;, Q^). 

6.2 Structure functions in deep inelastic scattering 

As has been discussed in the Introduction and in Scction f4.1.2l the requirements of precision physics 
at hadron colhders have recently led to a rapid improvement in the techniques for the determination 
of parton distributions of the nucleon, which are mostly extracted from deep-inelastic structure 
functions |135| . Specifically, it is now mandatory to determine accurately the uncertainty on these 
quantities. The main problem to be faced here is that one is trying to determine the uncertainty 
on a function, i.e., a probability measure on a space of functions, and to extract it from a finite 
set of experimental data. In Ref. 11 this problem was studied in a simpler context, namely, the 
determination of a structure function and associate error from the pertinent data. This sidesteps the 
technical complication of extracting parton distributions from structure functions, but it does tackle 
the main issue, namely the determination of an error on a function. Furthermore, the determination 
of a structure function and associate error might be useful for a variety of applications, such as 
precision tests of QCD (determination of 136,, tests of sum rules) or the determination of 
polarized structure functions from asymmetry data |137j . 

In this part of the thesis we extend the results of Ref. ITTl by constructing a parametrization of 
the proton F2{x,Q^) structure function which includes all available data, in particular the HERA 
collider data. Besides the obvious motivation of having state-of-the art results for this quantity, the 
main aim of this work is to develop a set of techniques which are required for the application of the 
method of Ref. [2] to cases where the handling of a large number of disparate data sets is required. 
A more detailed description of this part of the thesis can be found in the original work, Ref. 3 . 

Therefore, we construct a parametrization of the -^2(2;, Q^) structure function based on all avail- 
able unpolarized charged lepton-proton deep-inelastic scattering data. However, we do not include 
early SLAC data, for which the covariance matrix is not available, since they do not provide any 
extra kinematic coverage, and are anyway less precise than later data. This leaves a total of 13 ex- 
periments, listed in Table along with their main features. Note that the averages of the different 
types of uncertainties are given in percentage. The coverage of the (x,Q^) kinematic plane afforded 
by these data is shown in Fig. 16.51 Note that the kinematical coverage of the experimental data 
included in the present parametrization span 6 orders of magnitude in both x and . 
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Experiment 


Rcf. 






g2 range 


«dat 


(t^stat > 




/ "sys \ 
\ tTstat / 




("tot) 


(P> 


(cov) 


NMC 


138 


2.0 10^^ 


- 6.0 10~^ 


0.5 - 75 


288 


3.7 


2.3 


0.76 


2.0 


5.0 


0.17 


3.S 


BCDMS 


113*111401 


6.0 10"-^ 


- 8.0 10^^ 


7 - 260 


351 


3.2 


2.0 


0.56 


3.0 


5.4 


0.52 


13.1 


E665 


nm 


8.0 10~* 


- 6.0 10~^ 


0.2 - 75 


91 


8.7 


5.2 


0.67 


2.0 


11.0 


0.21 


21.7 


ZEUS94 




6.3 10~° 


- 5.0 10~^ 


3.5 - 5000 


188 


7.9 


3.5 


1.D4 


2.0 


10.2 


0.12 


6.4 


ZEUSBPC95 


17431 


2.0 10~^ 


- 6.0 10~^ 


0.11 - 0.65 


34 


2.9 


6.6 


2.38 


2.0 


7.6 


0.61 


34.1 


ZEUSSVX95 




1.2 10~^ 


- 1.9 10~^ 


0.6 - 17 


44 


3.8 


5.7 


1.53 


1.0 


7.1 


0.10 


4.1 


ZEUS97 


ITT^ 


6.0 10~^ 


- 6.5 10~1 


2.7 - 30000 


240 


5.0 


3.1 


0.93 


3.0 


6.7 


0.29 


7.0 


ZEUSBPT97 


ESI 


6.0 10~^ 


- 1.3 10~^ 


0.045 - 0.65 


70 


2.6 


3.6 


1.40 


1.8 


4.9 


0.41 


8.8 


H1SVX95 


inn 


6.0 10"" 


- 1.3 10"-^ 


0.35 - 3.5 


44 


6.7 


4.6 


0.74 


3.0 


8.9 


0.36 


28.1 


H197 




3.2 10~^ 


- 6.5 10~^ 


150 - 30000 


130 


12.5 


3.2 


0.31 


1.5 


13.3 


0.06 


10.9 


H1LX97 




3.0 10~^ 


- 2.0 10~^ 


1.5 - 150 


133 


2.6 


2.2 


0.87 


1.7 


3.9 


0.30 


3.9 


H199 




2.0 10~^ 


- 6.5 10^^ 


150 - 30000 


126 


14.7 


2.8 


0.24 


1.8 


15.2 


0.05 


11.0 


HlOO 


Ell 


1.3 10~^ 


- 6.5 10^^ 


100 - 30000 


147 


9.4 


3.2 


0.42 


1.8 


10.4 


0.09 


8.6 



Table 6.4: Experiments included in this analysis. All values of a and cov are given as percentages. 



As has been discussed in Section r4.1.2l deep- inelastic structure functions are defined by parametriz- 
ing the deep-inelastic neutral current scattering cross section as 



xy^Fi{x,Q^) + {l^y)F2{x,Q^) + y(l-^'^F3{x,Q^)] . (6.18) 



We will construct a parametrization of the structure function F2{x, Q^), which provides the bulk of 
the contribution to Eg. 16.181 In fact, a large number of experiments present their results in terms 
of the reduced cross section, 

~f ^2^_ <Pa{x,Q^) 

which is equal to F2{x,Q^) in most of the {x,Q^) kinematical range. For all experiments the 
longitudinal structure function Fl(x,Q^) contribution, defined as 

Fl{x,Q^)= il + —^jF2{x,Q^)-2xFi{x,Q^) , (6.20) 

to the cross section has already been subtracted by the experimental collaborations, except for 
ZEUSBPC95, where we subtracted it using the values published by the same experiment. Note 
that the structure function F2 receives contributions from both 7 and Z exchange, though the Z 
contribution is only non-neghgible for the high datasets ZEUS94, H197, H199 and HlOO. We 
will construct a parametrization of the structure function F2 defined in Eq. 16.181 i.e. containing all 
contributions, since it is closer to the quantity which is experimentally measured, the reduced cross 
section Eq. 16.191 When the experimental collaborations provide separately the contributions to F2 
due to ^ or Z exchange we have recombined them in order to get the full F2 Eq. 16.181 

Experimental data on deep-inelastic structure functions consist on central values, statistical 
errors, and the contributions from the different sources of correlated and uncorrelated uncertainties. 
Uncorrelated systematic errors are added in quadrature to the statistical errors to construct the 
total uncorrelated uncertainty. On top of this, for some experiments, in particular for the ZEUS94, 
ZEUSSVX95 and ZEUSBPT97 experiments some uncertainties are asymmetric. For the treatment 
of asymmetric uncertainties we follow the prescription discussed in Section 15.1.11 

The construction of a parametrization of F2{x,Q^) according to general strategy described in 
Chapter |S1 consists in three steps: generation of a set of Monte Carlo replicas of the original exper- 
imental data, training of a neural network to each replica and finally statistical validation of the 
constructed probability measure. The Monte Carlo replicas of the original experiment are generated 
as a multi-gaussian distribution: each replica is given, following Eq. 15.31 by a set of values 

/ N,y, \ 

Ft'^^'^ -{1 + Ft'^ + -S,H-sys,. + r^a,, , k = 1, . . . , N,^, , (6.21) 
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10 100 1000 


(PE (f'^''*' 

J, r^(art) 




1.88% 0.64% 0.20% 
0.99919 0.99992 0.99999 




6.7 X lO"'' 2.0 X lO"'' 6.9 x 10"^ 
37.21% 11.77% 3.43% 
0.0292 0.0317 0.0316 
0.945 0.995 0.999 




dat 
It 


8.1 X 10-2 7.8 X 10-3 7.3 x lO"'^ 
0.3048 0.3115 0.2920 
0.696 0.951 0.995 


r [cov^'^''* 


)dat 
iat 
1 


5.2 X lO-'^ 6.8 X 10"^ 6.9 x 10"^ 
0.00013 0.00018 0.00015 
0.687 0.941 0.994 



Table 6.5: Comparison between experimental and Monte Carlo data. 

The experimental data have (cr^°''P>)^^^ = 0.0311, (/o*'''"'')^^^ = 0.2914 and (cov^^^'p))^^^ = 0.00015. AU 
statistical indicators are defined in Sectionf5.il 



where the various errors are defined in Eqns. 15.415.81 As has been discussed before, the value of iVrep 
is determined in such a way that the Monte Carlo set of replicas models faithfully the probability 
distribution of the data in the original set. A comparison of expectation values, variances and 
correlations of the Monte Carlo set with the corresponding input experimental values as a function 
of the number of replicas is shown in Fig. 16.61 where we display scatter plots of the central values 
and errors for samples of 10, 100 and 1000 replicas. The corresponding plot for correlations is 
essentially the same as that shown in Ref. jjjj . A more quantitative comparison is performed using 
the statistical estimators as defined in Section lOI The results for these estimators are presented in 
Table 1^31 Note in particular the scatter correlations r for central values, errors and correlations, 
which indicate the size of the spread of data around a straight line. The table shows that a sample 
of 1000 replicas is sufficient to ensure average scatter correlations of 99% and accuracies of a few 
percent on structure functions, errors and correlations. 

A'rcp neural networks are then trained on the Monte Carlo replicas, by training each neural 
network on all the i^^^'^^^^''*) data points in the fc-th replica. The architecture of the networks is the 
same as in Ref. |11|. The training is subdivided in three epochs, each based on the minimization 
of a different error function, as described in Section 15.2.31 First one minimizes ^ , then Ef> and 
finally correlated systematics are incorporated in the minimization of £'3''^. The rationale behind 
this three-step procedure is that the true minimum which the fitting procedure must determine is 
that of the error function with correlated systematics E^lf' Eq. 15.341 However, this is nonlocal and 
time consuming to compute. It is therefore advantageous to look for its rough features at first, then 
refine its search, and finally determine its actual location. 

The minimum during the first two epochs is found using back-propagation (BP), discussed in 
Section 15.2.31 This method is not suitable for the minimization of the nonlocal function Eq. 15.341 
In Ref. BP was used throughout, and the third epoch was omitted. This is acceptable provided 
the total systematics is small in comparison to the statistical errors, and indeed it was verified that 
a good approximation to the minimum of Eq. 15.341 could be obtained from the ensemble of neural 
networks. This is no longer the case for the present extended data set, as we shall see explicitly in 
brief. Therefore, the full e'^'^ Eg. 15.341 is minimized in the third training epoch by means of genetic 
algorithms (GA), also discussed in Section 15. 2. 31 
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Figure 6.6: Scatter plot of experimental data versus Monte Carlo artifical data for both central values and 
errors. 



At the end of the GA training we are left with a sample of A^rop neural networks, from which e.g. 
the value of the structure function at (x, Q^) can be computed as 



The goodness of fit of the final set is thus measured by the per data point, Eq. 15.761 which, given 
the large number of data points is essentially identical to the per degree of freedom. 

In order to apply the general strategy to the present problem several points must be considered: 
the choice of training parameters and training length, the choice of the actual data set, and the 
choice of theoretical constraints. We now address these issues in turn. The parameters and length 
for the first two training epochs have been determined by inspection of the fit of a neural network 
to the central experimental values. Clearly, this choice is less critical, in that it is only important in 
order for the training to be reasonably fast, but it does not impact on final result. 

After these first two training epochs, the diagonal error function E!f\ Eg. 15.331 for a training 
on central values is of order two for the central data set, with a similar length of the training than 
that that was required to reach i?2°'' ~ 1.3 for the smaller data set of Ref. ^J. The value of E^^\ 
Eg. 15. 341 which is always bounded by it, E3 < E2 is accordingly smaller (see Table I^THll . The training 
algorithm then switches to GA minimization of the i?3 , Eq. 15.341 The determination of the length 
of this training epoch is critical, in that it controls the form of the final fit. This can only be done 
by looking at the features of the full Monte Carlo sample of neural networks. 

Before addressing this issue, however, it turns out to be necessary to consider the possibility of 
introducing cuts in the data set. Indeed, consider the results obtained after a GA training of 4 x 10''' 
generation to the central data set, displayed in Table This is a rather long training: indeed, in 
each GA generation all the data are shown to the nets. Hence in 4 x 10** GA generations the data 
are shown to the nets 0.7 x 10® times, comparable to the number of times they are shown to the nets 
during BP training. It is apparent that whereas ~ 1 for most experiments, it remains abnormally 
high for NMC and especially ZEUS94 and ZEUSSVX95. Because of the weighted training which 
has been adopted, this is unlikely to be due to insufficient training of these data sets, and is more 
likely related to problems of these data sets. 

Whereas ZEUSSVX95 only contains a small number of data points, NMC and ZEUS94 account 
each for more than 10% of the total number of data points, and thus they can bias final results 
considerably. The case of NMC was discussed in detail in Ref. ^Jj . This data set is the only one to 
cover the medium- a;, medium-small region (see Fig. I6.5|l and thus it cannot be excluded from the 
fit. As discussed in Ref. the relatively large value of E^ for this experiment is a consequence of 




(6.22) 
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TABLE 3 





A 


B 


Experiment 


E2 


E3 


E2 


E3 


Total 


2 


05 


1 


54 


2 


03 


1 


36 


NMC 


1 


97 


1 


56 


1 


74 


1 


54 


BCDMS 


1 


93 


1 


66 


1 


32 


1 


26 


E665 


1 


64 


1 


37 


1 


83 


1 


38 


ZEUS94 


3 


15 


2 


26 


3 


01 


2 


21 


ZEUSBPC95 


4 


18 


1 


32 


5 


18 


1 


24 


ZEUSSVX95 


3 


37 


1 


88 


5 


68 


2 


11 


ZEUS97 


2 


33 


1 


54 


3 


02 


1 


37 


ZEUSBPT97 


2 


82 


1 


97 


2 


08 


1 


22 


H1SVX95 


3 


21 





96 


4 


74 


1 


09 


H197 





86 





76 


1 


08 





87 


H1LX97 


1 


96 


1 


46 


1 


50 


1 


18 


H199 


1 


15 


1 


07 


1 


10 


1 


01 


HlOO 


1 


59 


1 


50 


1 


48 


1 


26 



Table 6.6: The uncorrelated error function, E2 , Eg. 15.881 and the correlated one, -Eg , Eg. 15.841 for the 
fit to the central data points: (A) after the back-propagation training epoch and (B) after the final genetic 
algorithms training epoch. 



internal inconsistencies within the data set. A value of -E3 « 1.5 indicates that the neural nets do 
not reproduce the subset of data which are incompatible with the bulk, as it should be, whereas a 
value E3 fa 1 could only be obtained by overlearning, i.e. essentially by fitting irrelevant fluctuations 
(see Ref. 

Let us now consider the case of ZEUS94. The kinematic region of this experiment is entirely 
covered by the ZEUS97, H197, H199 and HlOO experiments. We can therefore study the impact of 
excluding this experiment from the global fit, without information loss. The results obtained in such 
case are displayed in Table IHTI when the experiment is not fitted the E3 value for all experiments 
with which it overlaps improves and so does the global E3, whereas E3 for ZEUS94 itself only 
deteriorates by a comparable amount, despite the fact that the experiment is now not fitted at all. 
We conclude that the experiment should be excluded from the fit, since its inclusion results in a 
deterioration of the fit quality, whereas its exclusion does not entail information loss. Difhculties in 
the inclusion of this experiment in global fits were already pointed out in Refs. [ 1521 1153| . where it 
was suggested that they may be due to underestimated or non-gaussian uncertainties. It is likely 
that ZEUSSVX95 has similar problems. However, its inclusion in the fit is no reason of concern, even 
if its high E3 value were due to incompatibility of this experiment with the others or underestimate 
of its experimental uncertainties, because of the small number of data points. It is therefore retained 
in the data set. Our final data set thus includes all experiments in Table except ZEUS94. We 
are thus left with iVdat =1698 data points. 

For the sake of future applications, it is interesting to ask how the procedure of selecting exper- 
iments in the data set can be automatized. This can be done in an iterative way as follows: first, a 
neural net (or sample of neural nets) is trained on only one experiment; then, the total E3 for the 
full data set is computed using this neural net (or sample of nets); the procedure is then repeated for 
all experiments, and the experiment which leads to the smallest total E3 is selected. In the second 
iteration, the net (or sample of nets) is trained on the experiment selected in the first iteration plus 
any of the other experiments, thereby leading to the selection of a second experiment to be added 
to that selected previously, and so on. The process can be terminated before all experiments are 
selected, for instance if it is seen that the addition of a new experiment does not lead to a significant 
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TABLE 4 



Experiment 


E3 


Total 


1.25 


NMC 


1.51 


BCDMS 


1.24 


E665 


1.23 


ZEUS94 


2.28 


ZEUSBPC95 


1.16 


ZEUSSVX95 


2.08 


ZEUS97 


1.37 


ZEUSBPT97 


1.00 


H1SVX95 


1.04 


H197 


0.84 


H1LX97 


1.19 


H199 


1.00 


HlOO 


1.24 



Table 6.7: The same fit as the last column of Table if the ZEUS94 data are excluded from the fit. 



improvement in £^3 for a large enough length of training. 

We now proceed to discuss the length of training for our final data set. The total x^, Eg. 1^7^ 
as computed from averages over the trained network sample, is shown in Fig. 16.71 as a function 
of the number of GA generations. The decreases very rapidly during the first few hundreds 
of training generations, a typical feature of genetic algorithms minimization. After about 5000 
training generations, the &s a function of the training length essentially flattens for all experiments 
but BCDMS. The further decrease of the total is then due essentially to the decrease of the 
contribution from BCDMS. A training length of 4 x 10"* GA generations is necessary in order for 
the of BCDMS to flatten out at x^ ~ 1-2. As discussed in Ref. 'TT', the BCDMS data can 
only be learnt with a longer training because they have high precision while being located in the 
intermediate x (valence) region, where the parton distributions display significant variation. 

The iJg"^ values for the fit of a neural net to the central data with this training is given in 
Table 16.61 It shows that all experiments are well reproduced with the exceptions discussed above. 
It is interesting to observe that while E^^ Eg . 15 . 34l decreases significantly during the GA training, 

the uncorrelated i?2°'' j Eq. 15.331 decreases marginally, and in fact it actually increases for several 
HERA experiments. This shows that correlations are sizable for the HERA experiments, so that the 
approach of Ref. 11 , based on the minimization of E2, is not adequate in this case. GA minimization 
appears to be very efficient in reducing the E3 value relatively fast. 

We finally turn to the issue of theoretical constraints. The only theoretical assumption on the 
shape of F2{x,Q^) is that it satisfies the kinematic constraint F2{1,Q^) = for all Q^. As this 
constraint is local, its implementation is straightforward: it can be enforced by including in the data 
set a number of artificial data points which satisfy the constraint with a suitably tuned error. In the 
present fit we have checked that the best choice is to add a number of artificial points at a; = 1, as 
discussed in Section [5.2.41 equal to 2% of the experimental trained points (33 points with ZEUS94 
excluded from the fits), and with error equal to one tenth of the mean statistical error of the trained 
points. These points are equally spaced in InQ^, within the range covered by the experimental data. 

The result of the minimization of a single neural net to the central data points is shown in Table 
16.71 The results for the final set of 1000 neural networks are displayed in Table 1^751 while in Table 
we give the details of results for each experiment. Note that the figure of merit for the minimization 
E3, Eg. l5.34| and its average, defined by Eq. 15.771 differs from the full x^, Eg. 15. 761 not only because 
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BP training, cycles/10* | GA training, cycles/10^ Cycles/1000 



Figure 6.7: Dependence of the total Ea. l5.76l on the length of training: (left) total training (right) detail 
of the GA training. 



the latter is computed from the structure function averaged over nets Eq. 15.221 but also because of 
the different treatment of normalization errors in the respective covariance matrices, Eq. 15.351 and 
Eq. 15.51 Besides the we also list the values of various quantities, defined in Section 15.31 which 
can be used to assess the goodness of fit. 

The quality of the final fit is somewhat better than that of the fit to the central data points 
shown in Table 15.71 In particular, with the exception of NMC (which is likely to have internal 
inconsistencies and ZEUSSVX95 (which is likely to have the same problems as those of ZEUS94) 
the psr degree of freedom is of order 1 for all experiments. It is interesting to note that the 
for the neural network average is rather better than the average (£'3). The (scatter) correlation 
between experimental data and the neural network prediction equals one to about 1% accuracy, 
with the exception of NMC, ZEUSSVX95 (which have the aforementioned problems) and E665. 
The E665 kinematic region overlaps almost entirely (apart from very small < 1 GeV^) with that 
of NMC and BCDMS, while having lower accuracy (this is why the experiment was not included 
in the fits of Ref. ^I]). The data points corresponding to this experiment are therefore essentially 
predicted by the fit to other experiments, thus explaining the somewhat smaller scatter correlation. 

The average neural network variance is in general substantially smaller than the average ex- 
perimental error, typically by a factor 3 — 4. This is the reason why (E) > x^'- the neural nets 
fluctuate less about central experimental values than the Monte Carlo replicas. In the presence of 
substantial error reduction, the (scatter) correlation between network covariance and experimental 
error is generally not very high, and can take low values when a small number of data points from 
one experiment is enough to determine the outcome of the fit, such as in the case of the NMC 
experiment, even more so for E665 

As discussed extensively in Ref. it is important to make sure that this is due to the fact 
that information from individual data points is combined through an underlying law by the neural 
networks, and not due to parametrization bias. To this purpose, the 7?,-estimator has been introduced 
in Section [5.31 where it was shown that in the presence of substantial error reduction 7?. > 1 if there 
is parametrization bias, whereas TZ » 0.5 in the absence of parametrization bias. It is apparent from 
Tables lO^ and lHl^ that indeed TZ « 0.5 for all experiments. Note that, contrary to what was found in 
ref. [11 , there is now some error reduction also for the BCDMS experiment, though by a somewhat 
smaller amount than for other experiments. We will come back to this issue when comparing results 
to those of Ref. [TT] . 

Further evidence that the error reduction is not due to parametrization bias can be obtained 
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1.18 


{E) 


2.52 


J- |^^(nct)j 


0.99 


7^ 


0.54 



/ dat 



/cr(cxp)\ 

\ / C 



((^(net)^ 
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dat 



dat 



(P*^'^^^>dat 

(p^'^:'^>dat 



"(art) 



(cov(°''P)) 



{ 



dat 
dat 



1.2 10^3 
80% 
0.027 
0.008 
0.73 



0.20 
0.31 
0.67 
0.54 



3.3 10-^ 
1.3 10-"* 
3.6 10-5 
0.49 



Table 6.8: 

Estimators of the final results for the constructed probability measure of F2(x,Q2). 



by studying the dependence of (o''"™*'')^^^ on the length of training. This dependence is shown in 
Fig. 16.81 for the BCDMS experiment. It is apparent that the error reduction is correlated with the 
goodness of fit displayed in Fig. 16.71 and it occurs during the GA training, thereby suggesting 
that error reduction occurs when the neural networks start reproducing an underlying law. If error 
reduction were due to parametrization bias it would be essentially independent of the length of 
training. 

The point-to-point correlation p of the neural nets is somewhat larger than that of the data, 
as one might expect as a consequence of an underlying law which is being learnt by the neural 
networks. In fact, for the NMC experiment the increase in correlation essentially compensates the 
reduction in error, in such a way that the average covariance of the nets and the data are essentially 
the same. This again shows that in the case of the NMC experiment a small number of points 
is sufficient to predict the remaining ones. For all other experiments, however, the covariance of 
the nets is substantially smaller than that of the data. As a consequence the (scatter) correlation 
of covariance remains relatively high for all experiments, except NMC, and especially E665 whose 
points are essentially predicted by the fit to other experiments. 

The structure function and associated one-a error band is compared to the data as a function of 
X for a pair of typical values of in Fig. 16.91 In Fig. l6.10l the behavior of the structure function as 
a function of x at fixed and as a function of at fixed x is also shown. It is apparent that in the 
data region the error on the neural nets is rather smaller than that on the data used to train them. 
The error however grows rapidly as soon as the nets are extrapolated outside the region of the data, 
specially in the small- a; region and in the large-Q^ region. At large x, however, the extrapolation 
is kept under control by the kinematic constraint i^2(l,Q^) = 0. Note that the increase of the 
uncertainties in the extrapolation region, as opposed to the case for fits with functional forms, is due 
to the fact that the behavior of neural networks in this region is not determined by the corresponding 
behavior in the data (interpolation) region. 

The number of possible phenomenological applications of the neural network parametrization of 
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Figure 6.8: 

Dependence of on the length of training for the BCDMS experiment. 



the proton structure function Q^) is rather large, from determinations of the strong coupling 

as {Q^), comparison of different theoretical predictions at low x, checks of sum rules, or its effects 
of the extraction of polarized structure functions and polarized parton distributions. Another pos- 
sibility is a detailed quantitative study of i^|'(a;,Q^) in the transition region between perturbative 
and nonperturbative regimes around — 1 GeV^. Our parametrization is specially suited for this 
purpose since it incorporates all the information from experimental data without introducing any 
bias from the functional form behavior of the transition region. In Fig. Iti.lll we show the logarithmic 
derivative of the structure function, defined as 



A(g^xo) 



_ lnF2ix,Q^) 



din a; 



(6.23) 



for two different values of xq. Note that at very low expectations are that the Pomeron exponent, 
A((52 ^ 0) = 0.08 is recovered. 

Let us finally compare the determination of F2{x, Q^) presented here with that of Ref. , which 
was based on pre-HERA data. In Fig. 16.121 one-cr error bands for the two parametrizations are 
compared, whereas in Fig. 16.1.^ we display the relative pull of the two parametrizations, introduced 
in Section [5. HI which in the present situation is given by 



jpncw 



^<jl^Jx,Q^)+ai,^{x,Q^) 



(6.24) 



where a{x,Q^) is the error on the structure function determined as the variance of the neural 
network sample. In view of the fact that the old fit only included BCDMS and NMC data, it is 
interesting to consider four regions: (a) the BCDMS region (large x, intermediate Q^, e.g. x — 0.3, 
Q2 = 20 GeV^); (b) the NMC region (intermediate x, not too large Q"^, e.g. x = 0.1, Q"^ = 2 GeV^); 
(c) the HERA region (small X and large (^^ e.g. 0.01 and > 10 GeV^); (d) the region where 
neither the old nor the new fit had data (very large or very small Q^)- In region (a) the new fit 
is rather more precise than the old one, for reasons to be discussed shortly, while central values 
agree, with P < 1). In region (b) the new fit is significantly more precise than the old one, while 
central values agree to about one sigma. In region (c) the new fit is rather accurate while the old fit 
had large errors, but P ^ 1 nevertheless, because the HERA rise of F2 is outside the error bands 
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Figure 6.9: 

Final results for F2{x,Q ) compared to experimental data. For the neural network result, the one-cr error 

band is shown. 



extrapolated from NMC. This shows that even though errors on extrapolated data grow rapidly they 
become unreliable when extrapolating far from the data. Finally in region (d) all errors are very 
large and P is consequently small, except at small x and large Q^, where the new fits extrapolate 
the rise in the HERA data, which is missing altogether in the old fits. 

Let us finally come to the issue of the BCDMS error, which, as already mentioned, is reduced 
somewhat in the current fit in comparison to the data and the previous fit. This may appear 
surprising, in that the new fit does not contain any new data in the BCDMS region. However, as 
is apparent from Fig. 16.81 this error reduction takes place in the GA training stage, when is 
minimized. Furthermore, we have verified that if the uncorrelated E2 is minimized during the GA 
training no error reduction is observed for BCDMS. Hence, we conclude that the reason why error 
reduction for BCDMS was not found in Ref. ^I] is that in that reference neural networks were 
trained by minimizing £'2. In fact, as discussed above, the BCDMS experiment turns out to require 
the longest time to learn, especially after inclusion of the HERA data. Error reduction only obtains 
after this lengthy minimization process. 

In summary, we have presented a determination of the probability density in the space of structure 
functions for the structure function F2{x,Q^) for the proton, based on all available data from the 
NMC, BCDMS, E665, ZEUS and HI collaborations. Our results take the form of a Monte Carlo 
sample of 1000 neural networks, each of which provides a determination of the structure function 
for all {x,Q^). The structure function and its statistical moments (errors, correlations and so on) 
can be determined by averaging over this sample. The results of this part of the thesis are made 
available as a FORTRAN routine which gives F2{x,Q^), determined by a set of parameters, and 
1000 sets of parameters corresponding to the Monte Carlo sample of structure functions. They can 
be downloaded from the web site http://sophia.ecm.ub.es/f2neural/ 

This works updates and upgrades that of Ref. ^I] , where similar results were obtained from the 
BCDMS and NMC data only. The main improvements in the present work are related to the need 
of handling a large number of experimental data, affected by large correlated systematics. Apart 
from many smaller technical aspects, the main improvement introduced here is the use of genetic 
algorithms to train neural networks on top of back-propagation. This has allowed for a more accurate 
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Figure 6.10: 

One-cr error band for the structure function F2{x, Q^) computed from neural nets. Note the different scale 

on the y axis in the two plots. 



handling of correlated systematics. 

Whereas the results of this part of the thesis are of direct practical use for any application where 
an accurate determination of Q^) and its associate error are necessary, its main motivation is 

the development of a set of techniques which will be required for the construction of a full set of 
parton distributions with faithful uncertainty estimation based on the same method. The application 
of this strategy to the parametrization of parton distribution functions is discussed in Section 16.41 
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Figure 6.11: 

The logarithmic derivative of F2{x,Q'^), A(Q^), as determined from the neural network parametrization in 
the transition region between perturbative and nonperturbative regimes. 




Figure 6.12: 

Comparison of the parametrization of F2{x,Q^) of ref. TT; (old) with that described in 131 (new). The 
pairs of curves correspond to a one-cr error band. 
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4.4 10~^ 3.8 10~^ 1.7 lO""* 

5.2 10~^ 2.3 10~^ 3.3 10~^ 

-0.03 0.98 0.16 



n-9 



Expuiiiiii lit 




ZErri^\'Xif:> 






Hi^\"X!),""j 


HilJT 


HiLXIJT 




iliOO 


(E) 


1.02 
2.07 


2.08 
2.03 


1.35 
2.24 


0.86 
2.08 


0.67 
2.03 


0.71 
1.91 


1.07 
2.41 


0.90 
1.93 


1.11 

2.11 




0.98 


0.96 


0.99 


0.99 


0.97 


0.99 


0.99 


0.98 


0.99 




0.51 


0.66 


0.55 


0.55 


0.44 


0.46 


0.53 


0.48 


0.54 


(4^<"-j)da. 


4.3 10~^ 


0.0035 


0.0010 


1.3 10"* 


0.0043 


0.0030 


0.0005 


0.003 


0.0013 


(-4-'"°"])da. 


0.91 


0.94 


0.87 


0.72 


0.96 


0.95 


0.75 


0.96 


0.93 


/<,(«P)\ 

\ / dat 


0.022 


0.061 


0.037 


0.012 


0.063 


0.040 


0.027 


0.051 


0.030 


\ ^ / dat 


0.006 


0.013 


0.011 


0.006 


0.011 


0.008 


0.008 


0.008 


0.009 




0.85 


0.72 


0.86 


0.73 


0.84 


0.87 


0.42 


0.82 


0.89 


(^k""l>dat 


0.09 


0.30 


0.12 


0.14 


0.118 


0.14 


0.31 


0.16 


0.14 


/p(<=''P)\ 

\ / dat 
/p<"<=t)\ 

\ / dat 


0.61 


0.24 


0.28 


0.40 


0.36 


0.06 


0.29 


0.05 


0.09 


0.77 


0.64 


0.39 


0.63 


0.57 


0.27 


0.58 


0.29 


0.26 


r L(n<=t)T 


0.53 


0.40 


0.66 


0.60 


0.48 


0.51 


0.69 


0.37 


0.55 


(n-^'""j)dat 


6.4 10~^ 


1.9 10~^ 


3.4 10^^ 


1.4 10^" 


3.0 10~^ 


3.8 10"''' 


3.8 10~^ 


2.7 10~^ 


1.7 10~^ 


/ „^(cxp)\ 

\ / dat 


2.8 10~^ 


8.5 10"" 


3.7 lO"* 


5.8 10~^ 


0.0014 


1.0 10~* 


2.1 10~^ 


1.4 10~^ 


9.6 10~^ 


\ )dat 


2.8 10~^ 


1.2 10~^ 


3.2 10~^ 


2.3 10~^ 


7.0 10~^ 


1.510~^ 


6.9 10~^ 


1.6 10~^ 


2.2 10~^ 


r Tcovfn^t)! 


0.69 


0.48 


0.77 


0.65 


0.53 


0.61 


0.57 


0.54 


0.58 



Table 6.9: Final results for the individual experiments: fixed target (top) and HERA (bottom) 
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Figure 6.13: 

The relative pull, Eq. IfHH of the new and old F2{x,Q^) parametrizations. The one-a band is also shown. 
Note that in the kinematical region where experimental data for the two versions of the fit overlaps, the 



pull always verifies 



< 1, as expected from consistency arguments. 
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6.3 Lepton energy spectrum in B meson decays 

The third apphcation of the general strategy defined in Chapter|Slis the parametrization of the lepton 
energy distribution from semileptonic decays of B mesons. The motivation for such a parametriza- 
tion is the application of the general strategy of Chapter \S\ to the construction of the probability 
density of an observable when only its moments are available experimentally. As a byproduct of this 
parametrization, we will provide a determination of the b quark mass mf, (mfc). 

In the last decade the field of B meson physics has been the object of a wealth of studies (see 
Ref. |154j and references therein), motivated by the high precision measurements coming from 
the B factories Belle and Babar. In particular the inclusive semileptonic decays B — > Xlv, where 
X stands for a hadronic system, have received a lot of attention, both in the theoretical and in 
the experimental sides, due to its paramount importance for the determination of elements of the 
CKM matrix, and also since they provide us with important information on the underlying strong 
interaction dynamics. 

Therefore, our purpose in this part of the thesis is to allow a more general comparison of the 
theoretical predictions with the experimental data, constructing a parametrization of the lepton 
energy spectrum from available experimental information on its moments, supplemented by con- 
straints from the kinematics of the process. This goal is part of a more general motivation, namely 
the development of a suitable general strategy to determine a function with uncertainties if the only 
available experimental information is given in terms of its convolutions and additional constraints 
(like kinematical cuts). 

The success of previous applications of the technique introduced in Chapter [S] to other problems 
motivated us to implement this technique in the context of B physics. Therefore, in this work we 
construct a unbiased parametrization of the lepton energy spectrum in semileptonic B meson decays 
with a faithful estimation of the uncertainties. 

The semileptonic decays of the B meson have been introduced in Section 14.1.31 We will con- 
sider therefore experimental data for leptonic moments with kinematical cuts of the lepton energy 
distribution, Eq. 14.481 in semileptonic B meson decays to charmed final states B — > XJ,v. These 
moments have recently been measured with great accuracy at the B-factories, Babar jHO] and Belle 
|155j . as well as by the Cleo )15f)j detector. We will therefore use in the present analysis the latest 
data from these three experiments. We do not use data from CDF jl57j since it is restricted to 
hadronic moments and leptonic moments are not available. The features of the experimental data 
of these three experiments can be seen in Table 16.101 Note that for all experiments correlations are 
rather large, so it is compulsory to incorporate them in a consistent way in the statistical analysis 
of the data. 



Experiment 


A^dat 




n 


Eo (GeV) 


(Cstat) 


(ftot) 


(P> 


Babar 60 


20 





- 3 


0.6 - 1.5 


0.06 


0.08 


0.50 


Belle |T55j 


18 


1 


- 3 


0.4 - 1.5 


0.15 


0.16 


0.34 


Cleo[TKH] 


20 


1 


- 2 


0.6 - 1.5 


0.008 


0.0013 


0.65 



Table 6.10: Features of experimental data on lepton moments Af„(iJo),Eqns. 16.2516.271 where n stands for 
the order of the moment and Eq the lower cut in the lepton energy, see Eq. 14.491 Experimental errors are 
given as percentages. 

The final published observables for the semileptonic decays of the B meson are the moments of 
the lepton energy spectrum, Eq. 14.491 with different cuts in the lepton energy, rather than the full 
differential spectrum itself. The data that we use for the present parametrization of the lepton energy 
spectrum is the following: the Babar Collaboration "60' provides the partial branching fractions, 

Mo{Eo) = tbLo{Eo,0) ^tb (^') "^^i ' (6.25) 
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where tb is the average B meson hfetime |158) . the first moment 



and the central moments, 



]\J fT7 \ Ln{Eo, Mi{Eo)) (ann\ 

Mn{Eo) = ^ , n = 2,3, (6.27) 

for five difi'erent values of Eq from 0.6 to 1.5 GeV, which makes a total of 20 data points. The 
leptonic moments have been defined in Eq. 14.491 

The Belle Collaboration (155) provides the same moments, Af„(£'o) for n = 1, n = 2 and 
n ~ 3. For example, they define Mi = (Ei), which if one takes into account that the corresponding 
normalized probability density is given by 

^(^')-( r^„,..lr,^ )^(^')' Eo<E,<E^.., (6.28) 

\ J Eq dEi ^ / 

one ends up with Eq. 16.261 and similarly for the remaining moments. The difference with the Babar 
data is that the partial branching fraction Eq. 16.251 is not measured, and that the Belle data cover 
a somewhat larger lepton energy range, since the lowest value of Eq of their data set is Eq — 0.4 
GeV. These moments, for six different values of Eq from 0.4 to 1.5 GeV, make up a total of 18 data 
points. Finally the Cleo Collaboration '156 provides the moments A/„(i?o) for ti = 1, 2, for energies 
between 0.6 to 1.5 GeV, for a total of 20 data points (10 for n — 1 and 10 for n — 2). The correlations 
for this experiment are larger since measurements of the same moment at different energies Eq are 
highly correlated. The three collaborations provide also the total and statistical errors, as well as the 
correlation between different measurements. These characteristics are summarized in Table ISTTUl 

We have noticed that the experimental correlation matrices, p^i^^\ as presented with the pub- 
lished data of the three experiments |155[ I60L I156| . are not positive definite (see also '159J). The 
source of this problem can be traced to the fact that off-diagonal elements of the experimental corre- 
lation matrix are large, as expected since moments with similar energy cut contain almost the same 
amount of information and are therefore highly correlated. Then one can check that some eigen- 
values are negative and small, and this is pointing that the source of the problem is an insufficient 
accuracy in the computation of the elements of the correlation matrix. 

However, whatever is the original source of the problem, the fact that the experimental correlation 
matrices are not positive definite has an important consequence: the technique introduced in Chapter 
Ofor the generation of a sample of replicas of the experimental data taking into account correlations 
relies on the existence of a positive definite correlation matrix, and therefore if the experimental 
correlation matrix is not positive definite, our strategy cannot be applied. 

A method to overcome these difficulties while keeping the maximum amount on information on 
experimental correlations as possible consists on removing those data points for which the exper- 
imental correlations are larger than a maximum correlation, pf^^^ > p"^'^^. The value of p"^'^^ is 
determined separately for each experiment as the maximum value for which the resulting correlation 
matrix is positive definite. In Table I^TTI we show the value of p™'^'^ for each experiment, together 
with the features of the remaining experimental data after cutting those data points with too large 
correlations. Similar considerations about the need to take a restricted subset of data points due to 
the large point to point correlations have been discussed in the context of global fits in B physics, 
see for example |16()l I161| . 

As has been discussed in Chapter the first step of our strategy to parametrize experimental 
data is the generation of an ensemble of replicas of the original measurements, which in the present 
case consists in the measured moments, which we will denote by 

Mf'^P\ z-l,...,7Vdat , (6.29) 
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Experiment 


iVdat 




n 


Eo (GcV) 


(Cstat) 


(fftot) 




(P) 


Babar |6Dj 


16 





- 3 


0.6 - 1.5 


0.04 


0.05 


0.97 


0.49 


Belle [ISS] 


15 


1 


- 3 


0.4 - 1.5 


0.18 


0.19 


0.88 


0.31 


cleo [inni 


10 


1 


- 2 


0.6 - 1.5 


0.005 


0.009 


0.95 


0.69 



Table 6.11: Features of experimental data that is included in the fit, after cutting data points with too 
large correlations. Experimental error are given as percentages. 





10 100 1000 






2.47% 0.40% 0.24% 


{PE 
r 


art)\ 
, /riat 


32.4% 13.8% 3.4% 
0.00265 0.00277 0.00268 
0.95 0.99 0.99 


{PE 

(P 
r 


art)\ 

/riat 
p(artTl 


60.1% 19.6% 6.7% 
0.132 0.138 0.155 
0.75 0.96 0.99 


r [cov(^''*)] 


1.1 10-6 1.4 10-6 1.3 10-6 
0.86 0.98 0.99 



Table 6.12: Comparison between experimental and Monte Carlo data. 

The experimental data have ('7''"'''')^^^ = 0.00267 , = 0.166 and (cov'^^p')^^^ = 1.4 10-^ for a 

total of 41 data points. 



where M^'^^'^^ stands for any of Eqns. 16.2516.271 and iVdat is the total number of experimental data 
points, together with the total error and the correlation matrix. 

To generate replicas we proceed in the usual way. Since experimental data consists on central 
values for the moments, together with its total error and the associated correlations, the k-th replica 
of the experimental data is constructed, following Eq. 15.131 as 

^^jart)(fc) ^ ^^jcxp) ^ ^W^^^^^^ ^ J = 1, . . . , iVdat, , A: = 1 , . . . , iV.ep . (6.30) 

One can check that the sample of replicas is able to reproduce the errors and the correlations of 
the experimental data. In Table lO^ we have the statistical estimators for the replica generation. 
One can observe that to reach the desired accuracy of a few percent and to have scatter correlations 
r > 0.99 for central values, errors and correlations, a sample of 1000 replicas is needed. 

The next step is to train a neural network parametrizing the lepton spectrum for each replica of 
the experimental data. Therefore we parametrize the lepton energy spectrum, Eq. 

(-) 



(nct)(fc) 

(El), fc = l,...,iVrep , (6.31) 



where Ei is the lepton energy, with a neural network. From this neural network parametrization 
one can compute the corresponding moments, to compare with experimental data. For example, the 
leptonic moment Li(i?o,0) is computed for the k-th neural network as 

, , /--Emax / rir \ (nct)(fc) 



Now we describe the details of the neural network training. Concerning the topology of the 
neural network, in this case we find that an acceptable compromise is given by an architecture 
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1-4-3-1. Concerning the training strategy, in the present situation we wiU have a single training 
epoch minimizing the diagonal error defined in Eq. 15.331 but with the total error crl'^.^j' instead 
of only the statistical error with dynamical stopping of the training, as discussed in Section 15.2.31 
The minimization technique that we will use for the neural network training is Genetic Algorithms, 
which is suitable for minimization of highly nonlinear error functions, as in the present case. We 
also use weighted training to obtain a more even distribution between the different experiments. 
See the original work 5 for additional details on the neural network training. 

In situations in which experimental data consists of different experiments, as it is the case now 
(with Babar, Belle and Cleo), one has also to address the issue of possible inconsistency between 
different experiments, that is, the possibility that a subset of points from the two experiments in the 
same kinematical region do not agree with each other within experimental errors. This issue has been 
discussed in Section IIOl regarding the possible inconsistency of the ZEUS94 experiment with the 
other experimental data sets in the parametrization of the structure function Q^)- This issue 

is of paramount importance in the context of global parton distribution fits, see for example I83| . 
As has been discussed in detail in Ref. 5 , in this case the three experiments are fully compatible, 
as can be seen in Fig. 16. 141 for a training to the experimental data. 
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Figure 6.14: 

Dependence of the error function iSj"'' for a training of all three experiments on central experimental data. 
Note that at the end of the minimization iSj"^ "C 1 for all experiments. 

The lepton energy spectrum, Eq. 16.321 as parametrized with a neural network, has to satisfy 
three constraints independently of the dynamics of the process. First of all, it is zero outside the 
region where it has kinematical support, in particular it has to vanish at the kinematical endpoints. 
El — and Ei = £'max- Second, the spectrum is a positive definite quantity (since any integral over 
it is an observable, a partial branching ratio), therefore it must satisfy a local positivity condition. 

As we have discussed in Section f5.2.4l there are several methods in our strategy to introduce kine- 
matical constraints in an unbiased way. We have found that for the present application, the optimal 
method to implement the kinematical constraints that the spectrum should vanish at the endpoints 
is hard-wiring them into the parametrization, that is, the lepton energy spectrum parametrized by 
a neural net will be given by 

/ J-p \ (nct)(fc) 

[-j^J (El) = E-^^['^\Ei){E,^,^ - Eir , (6.33) 

with rii, positive numbers, and s}^^ is the output of the neural network for a given value of Ei. 
Assuming this functional behavior at the endpoints of the spectrum introduces no bias since it can 
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be shown that our resuhs do not depend on the value of ni and n2- For the reference training we 
have verified that ni = 1 and n2 = 1 are suitable values for these polynomial exponents. 

We will impose the remaining kinematical constraint, the positivity constraint, as a Lagrange 
multiplier in the total error. That is, the total quantity to be minimized is the sum of two terms, 



(k) 



E. 



(k) 



dT 
dEi 



(net)' 



(6.34) 



where £^2'°'' is Eq. I5.33l and the the positivity condition is implemented penalizing those configurations 
in which a region of the spectrum is negative. 



dT 
dEi 



(net)' 



dE, 



dT \ ^'"^^^ 



(6.35) 



since P is zero for a positive spectrum. The relative weight Ap is determined via a stability analysis, 
requiring that Ap is large enough so that the constraint is verified, but small enough so that experi- 
mental data can still be learned in an efhcient way. As we will show in brief, the implementation of 
the kinematical constraints plays an essential role in the parametrization of the lepton spectrum. In 
particular, if kinematic constraints are not included in the fit the error at small Ei grows very large 
and the extrapolation to -Bq = becomes completely unreliable. 
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Figure 6.15: 

Lepton energy spectrum, Eq. 14.481 as parametrized by the Monte Carlo ensemble of neural networks. The 
bands represent the 1-a uncertainty region. 

The set of neural networks parametrizing the lepton energy distribution trained on the Monte 
Carlo sample of replicas of the experimental data defines the probability measure in the space of 
lepton energy spectra. From this probability measure, as explained in Chapter[Sl expectation values 
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Figure 6.16: 

Comparison of the difFerent experimental moments, Eqns. 16.2516.271 as obtained from our parametrization 
with the original measurements, as a function of the lower cut on the lepton energy Eq. 



of functionals of the lepton spectrum can be computed as 
\ [dEi\/ J dEi 

We now present the results for this parametrization, from which averages and moments can be 
computed with the corresponding uncertainties. In Fig. 16.151 we plot the resulting lepton energy 
spectrum with uncertainties. In Fig. I6.16l we compare the computation of the moments of the lepton 
energy spectrum using our parametrization to the experimental results from Babar, Belle and Cleo. 
We observe good agreement for all the data points. As it has been explained in Section [531 it is 
crucial to validate the results of the parametrization using suitable statistical estimators. In Table 
16.131 we have the most relevant statistical estimators for all the data points, and in Table IHTTH the 
same estimators for the different experiments included in the fit. 

With the results described in this section the total branching ratio can be computed, even if 
experimental information was restricted to a finite value of Eq. This is possible because the continuity 
condition implicit in the neural network definition together with the kinematic constraints allow for 
an accurate extrapolation from the experimental data with lowest Ei — 0.4 GeV to the kinematic 







■ rfr ■ 
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[dEi\ 


\dEi\ 





iVrcp 

E 



dV 
dE, 



(not)(fc) 



(6.36) 
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endpoint Ei = 0. Note that this is not true if the BeUe data is not inchided in the fit, see Fig. 16.211 
The resuh that is obtained for the partial decay rate, computed from the neural network sample, 

{B {B ^ XM) ^ {Mo{E, = 0)) = TS-— 5] / dEi( — \ (El) , (6.37) 

is the total branching ratio, 

B{B^ Xclv) = (10.8 ± 0.4) 10-2 , (6.38) 
which is to be compared with the PDG result |162j . 

B{B^ ^c^i^)pDG = (10-2 ± 0-9) 10"^ , (6.39) 

and with the direct Delphi measurement of the total branching ratio |163| , which is measured without 
restrictions on the lepton energy, which yields 

B{B^ XMi,.XpM = (10.5 ± 0.2) 10-2 . (6.40) 

Is is observed that the three results are compatible, even if our determination is somewhat closer, 
both in the central value and in the size of the uncertainty, to the Delphi measurement. The small 
error in our determination oi B {B ^ Xclv) shows that the technique discussed in this work can be 
used also to extrapolate in a faithful way into regions where there is no experimental data available. 





10 100 1000 


Xtot 


1.31 1.11 1.11 
2.50 2.23 2.21 


{PE [(M),,pJ ) 
r [M] 


9% 8% 5% 
0.999 0.999 0.999 


(^^k^""'J)dat 
('-^^'^^^)dat 
(-^'^"^)dat 


67% 58% 10% 
0.00267 0.00267 0.00267 
0.00180 0.00168 0.00168 

0.77 0.85 0.85 


(P^'^'^^Odat 
(P^""^).at 


0.166 0.166 0.166 
0.32 0.245 0.245 
0.35 0.38 0.38 


(cO-^""^>dat 

r [cov("°*)] 


1.4 10-6 1.4 10-6 1.4 10-6 
7.8 10-^ 6.7 lO-'^ 6.7 lO-'^ 
0.49 0.53 0.53 



Table 6.13: Statistical estimators for the ensemble of trained neural networks, for 10, 100 and 1000 trained 
replicas. 

Important information on the quality of the fit can be obtained from the analysis of the de- 
pendence of the different statistical estimators with respect to the number of genetic algorithms 
generations, like the total or the average error. This dependence is represented in Fig. 16.171 
Note that at the end of the training Xtot ~ 1 ^'^d (x^) ~ 2, as expected. Note also that the fit has 
reached convergence since the \xot profile is very fiat for a large number of generations. 

This can be repeated for other statistical estimators, like for example the average spread of 
the data points as computed from the neural network ensemble, defined in Section [5.3.11 which is 
compared with the same quantity computed from the experimental data, a'f^^^ in Fig. 16.181 as a 
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Babar Belle Clco 


Xtot 


0.42 1.81 1.14 
1.75 2.75 2.21 


(PE (M) ) 

\ \ / rep / 

r[M] 


0.78% 18% 0.6% 
0.999 0.999 0.999 


\^ /dat 


47% 60% 71% 
0023 0021 0041 

yj.KJKJ^A. \J.\J\J^±. 

0.0016 0.0015 0.0020 
0.91 0.89 0.96 


(P^""^)dat 

r 


0.16 0.40 0.31 
0.17 0.21 0.33 
0.87 0.19 0.34 


r [cov("'=*^] 


6.9 10-6 1.5 10-6 6.5 lO"'^ 
1.9 10-^ 1.4 10-6 2.6 10"^ 
0.98 0.47 -0.04 



Table 6.14: Statistical estimators for the ensemble of trained neural networks, for those experiments included 
in the fit. The replica sample consists of A'^rop =1000 neural networks. 



function of the number of genetic algorithms generations. The fact that one has error reduction, 
as has been explained in is the sign that the network has found an underlying law to the 
experimental data, in this case the lepton energy spectrum. 

Another relevant estimator is the scatter correlations of the spread of the points (see Fig. 16.18(1 . 
One can define similarly a scatter correlation for the net correlation Pj-"'^*^ , also represented in Fig. 
16.181 for the Babar experimental data. We observe that when the training ends both values of r 
are close to 1, a sign that errors and correlations arc being faithfully reproduced. Another relevant 

(k) 

estimator of the goodness of the fit is the distribution of both £"2 and of the training lengths 
over the replica sample, see Fig. 16.191 We have checked that these two distributions satisfy the 
requirements described in Section [5.31 

We can compare also fits with and without the inclusion of kinematical constraints, to see which 
is the effect of their implementation. The endpoint constraint at Ei = S'max is crucial to obtain 
convergent results. In Fig. 16.201 one can observe that when the endpoint constraint at = and 
the positivity constraint are removed the error becomes very large at small and on top of that 
sightly negative at large Ei near the endpoint. Note that the physical value for the spectrum at the 
endpoint, dT/dEi(Ei = 0) = 0, is contained within the small- i?; error bars, as expected. 

In Fig. 16. 211 we show a comparison of a fit of a single experiment, Babar. We observe that when 
only the Babar data is incorporated in the fit, the error at small values of Ei is much larger. This 
is so because, as discussed above, the Belle data, which extends to lower values of Ei, is crucial to 
determine the low Ei behavior of the lepton spectrum. Finally, in Fig. 16. 211 we show that our results 
are independent of the precise choice of ni and 712 in Eq. 16.331 In particular using ni = 1.5 and 
77.2 = 1.5 results in the same fit as when using the reference values, ni = 1 and 712 = 2. 

As one example of the applications of the present parametrization of the lepton energy spectrum, 
in this section our results are compared with the theoretical calculation of Ref. 'CT (AGRU). Their 
formalism allows the computation of moments of arbitrary differential distributions from scmileptonic 
B meson decays, with arbitrary kinematical cuts, like the lepton energy spectrum in charmed decays 
that is analyzed in the present work. In particular their computation of the lepton energy spectrum 
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Figure 6.17: 

Total Xtot, Eq- l'')-76l of the replica sample, compared with average error, (x^)- 
will be studied, which they define as 

with E = E/nih, and where the leading order partonic semileptonic decay rate Flo is given by 

Flo =ro|Kbpzo(p) , (6.42) 

where Fq is defined in Eq. 14.511 p = m^/m^ and the phase space factor is defined in Eq. 14.411 These 
moments can be related to the experimentally measured moments, defined in Eqns. 16.251 - IH?^ in a 
straightforward way, for example for the first two moments one has 

Mo = tbToNo, Ml ^ mfc-i , (6.43) 

and similarly for the other moments. 

In Figs. 16.221 the results of j^U both at leading order (LO) and at next-to-leading order (NLO) 
are compared with the moments obtained from our parametrization as a function of the lower cut 
in the lepton energy Eq. Comparing results at different perturbative orders is interesting to asses 
the behavior of the perturbative expansion. One should take into account in this comparison that 
the results of are purely perturbative, therefore the difference between the two results could 
be an indicator of the size of the missing nonperturbative corrections. Another interesting feature 
is that while for Mq, the partial branching fraction, the NLO corrections are sizable and bring the 
theoretical prediction in better agreement with the experimental measurement, for Mi (which is the 
ratio of two perturbative expansions) the size of the perturbative corrections is much smaller. In 
addition, we show in Fig. IH^ the comparison for the n = 4 moment (not measured experimentally) 
with an estimation of the uncertainties in the theoretical prediction, as described in ^5 . 

A more general comparison with theoretical predictions should include also the known nonpertur- 
bative power corrections up to order 0{l/ml) to the expressions for the moments of the spectrum, 
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Figure 6.18: 

Average error of the data points as computed from the neural network sample, Eq. 15.171 as compared with 
the experimental value (left). Dependence during the training of the values of the scatter correlations, Eq. 
I5.9UI during the training, for the Babar experimental data (right). 



since in this case the difference of the theoretical results from our parametrization would indicate the 
size of the missing unknown corrections, both perturbative and nonperturbative. A more detailed 
study of this point, together with an analysis of possible violations of local quark-hadron duality 
|164j . is left for future work. 

As another application of our parametrization of the lepton energy spectrum, it will be used 
to determine the b quark mass ml^ from the experimental data using a novel strategy. To this 
purpose the technique of Ref. jl65| will be used, which consists on the minimization of the size 
of higher order corrections to obtain sets of moments of the lepton energy spectrum which have 
reduced theoretical uncertainty for the extraction of nonperturbative parameters like A15 and Ai. 
In Ref. [S| we describe in detail the method we use to determine the b quark mass from our neural 
network parametrization of the leptonic spectrum, which is summarized here. 

The moments that minimize the impact of the higher order nonperturbative corrections are given 

by 



Ri = ^ '—^ , (6.44) 

Jl ^IdEl^^l 



and 



i?2 = '-^ . (6.45) 

J0.8 dEl'^^l 

The full expression for this moments in terms of heavy-quark non-perturbative parameters can be 
found in Ref. |165| . These leptonic moments i?i and R2 depend on 9 nonperturbative parameters, 
up to 0(l/m^): Ais, Ai and A2, and six matrix elements, pi, P2, '''i, ^2, and ta, that contribute at 
order 1/to^ in the heavy quark expansion, of which not all of them are independent |58[ll66j . 

The most relevant feature of these leptonic moments Ri and R2 is that they have non-integer 
powers and to the best of our knowledge have not been yet experimentally measured, at least in a 
published form. Therefore, the values of Ri and R2 that will be used in this analysis are extracted 
from our neural network parametrization of the lepton spectrum, which allows the computation 
of arbitrary moments, together with their associated error and correlation. Let us recall that the 
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Figure 6.19: 

Distribution of error functions, -E2*\ over the sample of trained replicas (left) and distribution of training 
lenghts in GA generations (right). 



central values are determined as 

/ (nct)\ ^J_y ^(nct)(.) ^(„et)(.) ^ ,E. ^^O^^l ( ) 

^ ' ^^'■'^P fe=i Ji Ei^^^{Ei)dEi 

and similarly for i?2, and the error and the correlation of the moments Ri and R2 are computed 
in the standard way. The following values for the moments with the associated errors and their 
correlation are obtained, 

that as expected are highly correlated. Then to determine the nonperturbative parameters A15 and 
Ai the associated is minimized, 

Xl, = (i?!'-*^ ~ Rlt''^) (cov-1)^^. - i?f )) , (6.48) 

where cov~-^ is the inverse of the covariance matrix associated to the two moments and 

i^j"***', and i?-*'^'' (Ai5,Ai) is the theoretical prediction for these moments as a function of the two 
nonperturbative parameters |165) . 

Once the values of Ais and Ai have been determined from the minimization of Eq. 16.481 one 
obtains for the b quark mass m]f mass in the IS scheme the following value: 

ml^ = rhB- Ais = (4.84 ± Cie'^'^P ± 0.05"^) GeV = (4.84 ± 0.17*°*) GeV , (6.49) 

From the above results one observes that the dominant source of uncertainty is the experimental 
uncertainty, that is, the uncertainty associated to our parametrization of the lepton energy spectrum. 
This determination of the b quark mass is consistent with determinations from other analysis. The 
b quark mass has been determined using different techniques, like the sum rule approach, using 
either non-relativistic |167[ 11681 1169) or relativistic [17UI I171j sum rules, global fits of moments of 
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Figure 6.20: 

Comparison of the lepton energy spectrum when no kinematic constraints are incorporated. 

differential distributions in B decays, |159l 116^1 1161) . the renormalon analysis of Ref. |172| . and 
several other methods related to heavy-quarkonium physics |173[ 1174) (see |175) for a review). To 
compare our results with some of the above references, it is useful to relate the ml^ mass to the 
MS-bar ffii, (ffib) mass jl76| . and once the conversion is performed the value 



is obtained, where we have used Q!s(Af^) = 0.1182 and included perturbative corrections up to 
two loops. It turns out that our determination of is not competitive with respect to other 
determinations due to the large uncertainties associated to the fact that only two moments we use 
in the determination of these parameters. The inclusion of additional moments in the fit would 
constrain more the nonpcrturbative parameters and reduce the experimental uncertainty associated 
to them. 

For the nonpcrturbative parameter Ai the following value is obtained 



As in the determination of A15 it can be seen that the theoretical uncertainties are smaller than the 
experimental ones, which are the dominant ones. Our result for the parameter Ai is consistent with 
other extractions in the context of global fits of B decay data [1591 1163j , but again not competitive 
due to the rather large uncertainties. 

To summarize, in this part of the thesis we have presented a determination of the probability 
density in the space of the lepton energy spectrum from semileptonic B meson decays, based on the 
latest available data from the Babar, Belle and Cleo collaborations. In addition, this application 
shows the implementation of a well defined strategy to reconstruct functions with uncertainties when 
the only available experimental information comes through convolutions of these functions. As a 
byproduct of our analysis, using our parametrization of the lepton spectrum, we have extracted the 
nonperturbative parameters A15 and Ai, with a method that reduces the contributions from the 
theoretical uncertainties. 

The number of possible applications of this strategy to other problems in B physics is rather 
large, and is discussed in some detail in Ref. 13 . The most promising application is to use our 
neural network strategy to construct a parametrization of the shape function S{k) of the B meson, a 




(6.50) 




(6.51) 




Figure 6.21: 

Comparison of the lepton energy spectrum when the parameters ni and n2 are changed (left). Comparison 
of the lepton energy spectrum when only Babar data is incorporated in the fit with the result when all 
experiments are incorporated in the fit (right). 



universal characteristic of the B meson governing inclusive decay spectra in processes with massless 
partons in the final state, as extracted from the B Xs^ and B — > X^liy decay modes. In this case 
we have additional theoretical information on its behaviour. For example, at tree level its moments 

An = J dkk"S{k) , (6.52) 

have to satisfy Aq = 1, Ai = and A2 = where is a nonperturbative parameter of the heavy 

quark expansion. At higher orders these relations are theoretically more controversial [1771 1178| . 
Since the uncertainty from the extraction of S{k) is the dominant source of theoretical uncertainty 
in some CKM matrix elements extractions, it would therefore be interesting to estimate again this 
uncertainty using the technique presented in this work, since in the current approach |179j the shape 
function uncertainties are estimated in a rather crude way, just trying different functional forms 
compatible with the theoretical constraints. 
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Figure 6.22: 

Comparison of tlie results of Ref. |52| on tiie partial branching ratio Eq. l6.25l both at leading order (LO) 
and at next-to-leading order (NLO) with the same quantity computed from our parametrization. 




Cut on lepton energy Eq [GeV] 



Figure 6.23: 

Comparison of the results of Ref. |52| on the n = 4 moment and at next-to-leading order (NLO), with the 
corresponding theoretical uncertainties, with the same quantity computed from the neural network 

parametrization parametrization . 
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6.4 Parton distribution functions 

In this Section we describe the most important apphcation of the general strategy introduced in 
Chapter |31 Indeed, it was the determination of the probabihty measure in the space of parton 
distributions the main motivation for the development of the technique that is the main subject of 
this thesis. First we describe an alternative approach to evolve parton distributions, required by the 
use of neural networks as interpolants for parton distributions, and then we present results on the 
neural network parametrization of the nonsinglet parton distribution qNsi^i Qo) from experimental 
data on the nonsinglet structure function F^^{x, Q^). 

6.4.1 A new approach to parton evolution 

In order to construct the probability measure in the space of nonsinglet parton distributions, for 
reasons to be described in the following, we need a dedicated parton evolution formalism. In this 
Section the strategy that will be used for the evolution of parton distributions is introduced. For 
the present application only nonsinglet parton distributions will be considered, and therefore the 
description of this parton evolution technique will be restricted to the nonsinglet sector. This 
discussion can be generalized straightforwardly to the singlet sector and will be throughly described 
inRef. [Mil. 

There are two classes of methods for solving the perturbative QCD DGLAP evolution equa- 
tions, introduced in Section l4.1.2l the N-space methods (like the one described in which solve 
analytically the DGLAP evolution equations in Mellin space and then compute the inverse Mellin 
transformation to x-space, and the x-space methods (like |182| ). which perform a direct numerical 
integration of the integro-differential DGLAP equations. Both methods have different advantages 
and drawbacks, but in principle they lead to a similar accuracy in the evolution of parton distribu- 
tions, as shown in the dedicated analysis of Ref. 183 . However, since we are going to parametrize 
parton distributions not with simple functional forms but with neural networks, none of the standard 
techniques can be applied to our case. For example, in standard N-space evolution codes one needs 
to perform the Mellin transform of the parton distribution and extend it to complex values of N, but 
in our case this is not allowed since an analytical expansion of neural networks to complex values of 
the inputs and the outputs is mathematically ill-defined. 

Hence we need to use a hybrid strategy: we will solve the DGLAP evolution equations |38ll40llS^ 
in N-space, introduced in Section l4.1.2l and then we will Mellin invert only the evolution factor T{N) 
in Eq. 14.351 so that the evolved x-space parton distribution q{x, Q^) can be written as the convolution 
of the x-space parton distribution at the initial evolution scale q(x, Qq) and an x-space evolution 
factor r(x). Schematically, we will have 



Then we will interpolate the x-space evolution factor r(a;), so that the heavy numerical task of its 
computation is decoupled from the determination of the parameters describing the parton distribu- 
tion from experimental data. With all these considerations taken into account one ends up with a 
fast and efficient evolution code, which will described in this Section. 

First of all we define the notation that will be used for the strong coupling constant. The 
convention that we use is 





(6.53) 




(6.54) 



where the beta function coefficients are given by 



(6.55) 
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the scheme-dependent coefRcent f32 is given in |181) . and Nf is the number of active quark flavors. 

The expHcit solution for the above equation at the NNLO accuracy can be written in terms of a 
boundary condition as (M^) and is given by 



(^?')nnlo = (Q')lo (^1 + (Q')lo ("^ (^?')lo - i^'^z)) ib2 bl) + a, {Q')^^^ h hi 
where we have defined in a recursive way, 



as {C 



as 
(6.56) 



iQ') NLO = iQ')LO [ 1 - (^')lo ( 1 + ^oas (M|) In ^ ) ) , (6.57) 



as 



l + /3oa,(Af|)lnf 



and we have defined = Pk/ Pa- 

The evolution equations for nonsinglet combinations of parton distributions, as we have seen in 
Section [4. 1.21 read in Mellin space 

= '^^^lNsiN,as {Q^))<f^s{N.Q^) , (6.59) 

where z = ±,w stand for the three different nonsinglet combinations that evolve independently at 
NNLO, given by 

(lNS,ik^ {lk±Qk) , QnS "^^ilr - qr) ■ (6.60) 

In the following discussion we assume that in each case we use the appropriate anomalous dimension 
for each type of nonsinglet parton distribution, and since we are restricted now to the analysis 
of nonsinglet parton distributions, we will assume also that all quantities (parton distributions or 
anomalous dimensions) correspond to this nonsinglet sector. The nonsinglet anomalous dimension 
has been computed in powers of as (Q^) up to NNLO, 

j{N,as{Q'))=j^"\N) + ^^j(^HN)+(^^^^ j^'Hn) , (6.61) 

where the N-space anomalous dimension was computed at LO in Ref. ^U], at NLO in Ref. |184| . 
and recently the full NNLO contribution was computed in Ref. j^H]. The leading order nonsinglet 
anomalous dimension has the explicit form 



^ 1 



^^^k NiN+l) 



Cf^^, (6.62) 



which has the following large-N limit, 



lim 7('')(iV) = -4Ci.lniV +C'(7V"), (6.63) 

which will be needed for the computation of the large-x limit of the x-space evolution factor T{x). 
The solution to the nonsinglet evolution equation, Eq. 16.591 reads at NNLO accuracy 

qiN,Q^) =T{N,as (Q^) ,as {Ql))qiN,Ql) , (6.64) 
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where the N-space evolution factor T{N) is given by 

-7('"(A')//5o 



+ (a. (Q')' - a. (Qg)') C/^'^(iV) - (g2) (f/^^) (iV)) ' j , (6.65) 

where we have defined the nonsinglct evolution coefficients as 

ulv'liN) = (i^'Hn) - h^^^'HN)) , (6.66) 



ul^lW = [j^7^'\N)+hul,'l(N) - |7("H^) ) . (6.67) 



Once we have solved the DGLAP equations in Mellin space, Eq. 16.641 we Mellin invert the 
evolution factor in Eq. 16. 651 to perform parton evolution in x-space. Using the convolution theorem 
one can check that the x-space evolved parton distribution is given by 

q{x,Q^)^ f' ^ny,a4Q'),a4Ql))q(-,Ql) , (6.68) 

J X y \y / 

where the evolution kernel T{x) is the Mellin inverse of Eq. 16.651 

nC-\-ioO ^ AT 

T{x,a, {Q^),as (QI)) = / —x~^T{N,as (Q^) (Q^)) , (6.69) 

where c lies to the right of the rightmost singularity of r(A^). That is, one can check that the Mellin 
transform of the LHS of Eq. E^iis equal to the RHS of Eq. WMi 

1 

dxx^-ig(x, Q') = r (TV, as (Q2) , a, (Q^)) g(iv, g2) . (6.70) 

The complicated numerical task in this evolution formalism is the numerical computation of the 
Mellin inverse transformation, Eq. 16.691 However note that as opposite to standard N-space par- 
ton evolution methods, the evolution of the parton distribution can be decoupled from the Mellin 
inversion of the evolution factor, which is the most time consuming task. 

We will use the Fixed Talbot algorithm to perform the numerical inversion of the Mellin transform 
Eq. 16.691 For a detailed description of this algorithm and its efhciency see Refs. |185l I186| . 
The Fixed Talbot algorithm for the numerical inversion of Mellin-Laplace transforms is specially 
accurate for the following reason: the numerical computation of inverse Mellin transforms is in 
general difficult since the integrand is highly oscillatory since the integration path moves through 
the complex plane. The Fixed Talbot algorithm bypasses this problem by choosing a path in the 
complex plane which minimizes the imaginary part of the integrand and therefore minimizes the 
impact of these oscillations. In the Fixed Talbot algorithm a generic inverse Mellin transform is 
computed as 

/(^) = I x-^f{N)dN , (6.71) 



where the Talbot path C is defined by the condition 

N{9) =re{l/ta.n9 + i), < 6 < n , (6.72) 
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Figure 6.24: 

The x-space evolution factor F (^x, (Q^) , (Qo)) fo'' different perturbative orders at = 10* GeV'^ 
(left) and for different evolution lenghts at leading order (right). In all cases the starting evolution scale is 
Qq — 2 GeV^. Note the sharp rise of the evolution factor at large values of x. 



where M is the number of required precision digits. The Talbot path minimizes the contribution 
from oscillatory terms to the inverse Mellin Eq. 16.711 and therefore avoids the associated numerical 
instabilities. The resulting integral, Eq. 16.711 can be computed with adaptative gaussian quadratures 
to any desired accuracy. 

The x-space evolution factor F (x, as {Q^) , ctg (Qq)), Eq. 16.691 has a flat behavior at intermediate 
X together with a growing at both large and small x, where the behavior in both limits can be 
computed analytically. Now the explicit analytic expressions for the evolution factor T{x) at both 
large x and small x will be computed. These results are interesting both for themselves, to obtain 
a more quantitative understanding of our evolution technique, and also for practical purposes, since 
they allow to perform a more efhcient interpolation of F(a;). In Fig. 16.241 we represent the x-space 
evolution factor F(a:) for different perturbative orders and different evolution lenghts. Note that 
in the x range relevant for nonsinglet evolution, x > 5 10~^, the perturbative expansion for r(a;) 
appears to be near to convergence at the NNLO level. Note also that the small-x behaviour is very 
smooth in the relevant x range, unlike the large-x one. 

First of all, the large x limit of the evolution kernel will be computed in two equivalent ways. 
At leading order in as (Q^), m the large x limit, the dominant contribution to the evolution kernel 
comes from the large N limit of the LO anomalous dimension, Eq. 16.631 and therefore one has 



2Cf In N/bo 



r„.M=p^.-» ^41:1) ■ , ,6.74, 

J-ioo 27ri yas (Q5) J 
then one has to use the Mellin transform 

'x^-iln''-ii = ffl, rj>0, (6.75) 



x TV 



and the final result is 



r,^i {x, as (Q2) , as {Ql)) = — ^ In - , (6.76) 



= («i5) 



1 \ — 1+-^^ In 7 — ?7T- 

1 \ ''O ^7(Q2) 
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so that at large x one has 



{x, as (g^) , a. {Ql)) ~ (1 - x)^^Q'-'io) , a(Q2, QI) = -I + 



2Cp 



In 



bo as 



(6.77) 



therefore, the growth of r(a:) at large x depends only mildly on the evolution lenght. 

A second related method makes use of the analytic results for the all logarithmic orders Mellin 
inversions of Ref . I187j . The large x limit of the evolution kernel is given by Eq. 16.741 Its inverse 
Mellin transformation can be performed analytically using the formulas of Rcf. jl87| . which yield 



A("-i)(l) 



{ {n-l)\ 



E 



A("-i)(l) 



{ (n-1)! 



2Cf 



In 



r,^i {x,as {Q^),as (Ql)) =A 




2Cf ln{l-x)/bo' 



2Cf ln(l-a;)/bo' 



(l-x) 



-1+2^ In 



+ 0(il-xf,as) = 
+ 0{{l~x)",as) 



(6.78) 



where A (77) = l/r(77), which coincides with Eq. 16.761 once one takes into account that for large x 
one has that ln(l/a:) ~ (1 — x). On top of the computation of the large-x limit of r(x), the above 
derivation shows that x-space evolution factor can be defined as a distribution, just as standard 
splitting functions. 

We can compute also analytically the small x behavior. In the nonsinglet sector T(x) grows at 
small X as dictated by Double Asymptotic Scaling 188 . To see this, recall that r(a::) at low x is 
given by Eq. 16.691 expanding the LO anomalous dimension around its rightmost singularity, in this 
case the N — pole, so that one has 



Tx^o {x, Us (Q^) , as (Qo)) 



+''°° dN ( 1 

exp In — 

2m \ X 




1 

N 



(6.79) 



If in the expression above we perform a saddle point integration, one obtains that the leading small— a; 
behavior of the nonsinglet x-space evolution kernel is given by 



r^^o {x, as (Q^) , as (QI)) = TV exp 




2^-lniln 



as{Q^ 



(6.80) 



where A/" is a normalization factor. The growing of T(x) at low x is more important for larger values 
of Q^, as can be seen in Fig. 16.241 In the singlet sector, the leading behavior of r(a;) at low x can 
also be computed exactly and is given again by Double Asymptotic Scaling j21| . 



r^^o {x, as (Q^) , as (Ql)) = TV- exp - 



Po as (Q^) X 



(6.81) 



which is much larger at small x than the corresponding non-singlet result, Eq. 16.801 In the singlet 
case therefore, one would need to substract the effects of the small-x growth of the evolution factor, 
in a similar way that is done now with the large-x growth in the interpolation of T{x), to be discussed 
in the following. 

It is known that all splitting functions Pij(x,as (Q^)), except the non diagonal entries of the 
singlet matrix, diverge when x = 1. Therefore the nonsinglet evolution factor r(a:), Eq. 16.691 will 
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l-x 

Figure 6.25: 

The x-space evolution factor T{x) once tlie leading large-x behaviour has been substracted, Eq. 16.971 for 
different perturbative orders for — 10'* GeV^. Note that the resulting function is much more smooth, 

and therefore much more efficient to interpolate. 



likewise be divergent at x = 1. In a similar way as what it is done to regularize splitting functions, 
also evolution factors T{x) can be defined as distributions, as we have seen explicitely in Eq. 16.781 
This divergent yet integrable behavior poses also numerical problems that will be analyzed now. 
Now let us consider the evolution of a generic parton distribution, 

/(x,Q2)^ f' ^r{y)f (-,Ql) , (6.82) 

Jx y \y J 

and now let us add and subtract a delta function to the evolution factor r(?;), 

r(y) -r(t/)+ 7(5(1 -y)- 7(5(1 - y) , j = f T{z)dz . (6.83) 

Jo 

Inserting this decomposition in Eq. 16. 821 one finds that it can be written as 

fix, Q^) = j' ^T{y) (^f (^^,Qlj - yf {x, Qg)^ + / {x, Ql) j' r{y)dy , (6.84) 

so that now thanks to the subtraction that we have performed all the integrals in Eq. 16. 841 converge 
and thus can be computed numerically. Note that this is equivalent to the definition of the x-space 
evolution factor in terms of the plus distribution prescription, 

fix, Q2) = r ^r(y)+/ (-, Ql) + f {x, Ql) f ny)dy , (6.85) 

J X y \y / J X 

where the plus distribution is defined in the standard way |14j . 

Therefore, we will use to evolve our parton distributions Eq. 16.841 which can be written as 

q{x, Q2) = q2)^(^^ (g2) ^ {Ql))+J' j^iy, as (Q") , {Ql)) (^q Qo) " y^li^, QD) 

(6.86) 
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where we have defined 



j{x,as{Q^),as{Ql))^ f dyr{y,as{Q^),as{Ql)) , 7(0) = 7 • (6.87) 
Note that 7 can be written as 

At leading order the anomalous dimension satisfies ^'^'^\N = 1) = and therefore it follows that 
7 = 1. This result follows from momentum conservation. At higher orders this result applies only to 
certain combinations of nonsinglet parton distributions. For practical implementations we use the 
equality 

-l{x,as{Q^),a,{Ql))=V{N = 1)- ( dyT{y,as {Q^) ,as {QI)) , (6.89) 

Jo 

since in the nonsinglet sector the integral in the above equation is very fast to compute. Note 
that the above property does not hold in the singlet sector since the gluon-gluon and gluon-quark 
anomalous dimensions have a pole at = 1. 

We have benchmarked our evolution formalism with the benchmark evolution tables first pre- 
sented in Ref. |183| and recently updated including the full NNLO anomalous dimension in Ref. [31] ■ 
The results and the accuracy of this benchmark can be seen in Table where we use exactly the 
same parameters than in [1831 1^ for parton evolution in the Fixed Flavor Number (FFN) scheme. 
More details about this benchmarking procedure of QCD evolution codes can be found in Ref |183j . 
We have checked that the accuracy is always of the order O (10~^), which is the required accuracy 
on modern QCD evolution codes. Similar checks have been performed for the evolution of other 
parton distributions as well as for evolution in the Variable Flavor Number (VFN) scheme. 

The experimental observable that determines the nonsinglet parton distribution is the nonsinglet 
structure function, defined as the difference between structure functions in the proton and in the 
deuteron, 

^(x, Q2) ^ F^{x, Q2) _ F!^{x, Q^) , (6.90) 
which is related to the nonsinglet parton distribution via a perturbative coefRcient function, 

F^''{x,Q^)^x f ^CMs{y,c.s{Q^))qNs(-,Q^\ , (6.91) 
where the coefficient function has the following expansion up to NNLO in perturbation theory, 

C^s{x,a.{Q'))^5{l-x) + '^^C^^l(x)+[^l^^ d^\{x). (6.92) 

The NLO coefficient C'^^^{x) was computed in Ref. |184j . while the NNLO nonsinglet coefficient 
function was first computed in Ref. |189| . However, for the NNLO coefficient function we do not 
use the exact result but rather the N-space parametrization of Ref. |19(J) , which is fast and accurate 
enough for our purposes. 

The way that we incorporate the coefficient functions into our evolution formalism is through a 
redefinition of the x-space evolution kernel, 

/•C + iOC J AT" 

T{x,as{Q^),as{Ql))^ —x-''C{N,as{Q^))T{N,as{Q^),as{Ql)), (6.93) 

Jc—ioa 
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so that now the nonsinglet structure function can be written in terms of the parton distribution at 
the initial evolution scale as 

Ff^(x,Q2) = ^ [' ^f{y,a^{Q') ,a4Ql))q(^,Qi) . (6.94) 

J X y w J 

The rationale behind this procedure is to improve the speed of the evolution code, that is, if coefficient 
functions where introduced as in Eq. 16.911 we would need to perform an additional convolution 
integral each time a structure function was computed. The only drawback of this method is that 
the evolution factor becomes process-dependent, while the bare evolution factor T{x), Eq. 16.691 is 
process independent and indeed it could be used by itself as an alternative procedure for evolution 
of standard parton dstributions. 

We use a Variable Flavor Number (VFN) scheme with zero mass partons to incorporate the 
effects of heavy quark thresholds in the evolution. At NNLO one has to take into account that both 
the strong coupling and the parton distributions are discontinuous when crossing the heavy quark 
thresholds. We compute as (Q^) in the Variable Flavor Number scheme, taking into account the 
discontinuity at NNLO at heavy quark thresholds, 

asj+i{m)) = asjim}) + (^g^ c^sjimjf , (6.95) 

where agj is the strong coupling in the effective theory with Nf active light quarl flavors, is 
the position of the heavy quarl threshold, and C2 = 14/3 was computed in |191) . For the parton 
distribution at heavy quark thresholds, the corresponding N-space matching condition is given by 

= q'-^\N,n^,) (^1 + (^^-^) (iv))^ , (6.96) 

where the NNLO matching coefficients are determined in Ref. |192| . A more refined treatment of 
heavy quark mass effects |193l I194j is postponed to the case of singlet evolution, since it is known 
that the influcence of heavy quark mass effects in the nonsinglet sector is rather small. 

The evolution approach described above is very accurate but also very CPU time consuming. 
This is so since one has to compute both the N-space evolution factor and its Mellin inverse each time 
one wants to evolve a parton distribution. This is specially a problem in our approach, where we 
are parametrizing our parton distributions with neural networks, with a very large parameter space 
and thus the minimization routine requires more time than in the standard approach. What we do 
then is to interpolate the evolution kernel r(a:;) so that the the evolution of parton distributions, Eq. 
16.861 is much faster. 

The only problem is since r(a:) grows heavily at large x, it is numerically difficult to interpolate 
it. A way to overcome this difficulty is to substract to the exact result for r(a;) the large-x behavior 
Eq. 16.761 so that the resulting function to be interpolated is a smooth one. We interpolate the 
subtracted x-space evolution kernel, defined as 

(0") .0.. m> = r!!:i::i:^^°MQg)) ■ 

where ra;^i(a;) is given by Eq. 16.761 and the inclusion of the coefficient function does not affect 
the leading large x behavior. We also interpolate j{x) in Eq. 16.861 In Fig 16.251 we represent the 
behaviour of the substracted evolution factor r'^'"*)(a;), it is clear that this functional dependence is 
much more efficient to interpolate than that of the bare evolution factor, represented in Fig. 16.251 
Therefore, the nonsinglet structure function will be given in terms of an interpolated evolution factor 
as 

Fi'^{x,Q')=x f ^f^''^'\x,as{Q'),as{Ql))T^^i{x,as{Q'),as{Ql))q(-,Ql) , (6.98) 

J X y \y / 
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instead of with Eq. 

We use Chebishev interpolation in x for all the values of where there is experimental data. The 
rationale for using Chebishev polynomials for the interpolation is that the Chebishev approximation 
for a given function is very close to the minimax polynomial, defined as the approximating polynomial 
with the smallest maximum deviation from the exact function. On top of this property, Chebishev 
interpolation is extremely stable, and the increase in accuracy as we increase the order of the 
interpolation can be controlled in a stringent way. We require an accuracy in the interpolation 
0(10"^) for all the {x,Q^) range covered by experimental data. This accuracy is enough for 
our present purposes since this is the accuracy to which the exact evolution formalism has been 
benchmarked. Thanks to Eq. 16.971 this accuracy is achieved with a relatively small number of 
Chebyshev polynomials, and therefore the interpolated evolution kernel is rather fast to be evaluated 
for an arbitrary value of x, and for those values of where there is experimental data. 

In order to incorporate data in our fit with small , we must take into account the target mass 
corrections to the nonsinglet structure function 195 . These target mass corrections are incorporated 
into our analysis using the following standard relations, 

T^NS,LT,TMCf ^2' ^'^^ {£.TMC , Q"^) . a^^^^Tfi: fa nn^ 

F2 = 72 ^ 6— J— /2(Cta/c,Q ) , (6.99) 



where we have defined 

mTMcQ') = / dz^ , (6.100) 

2x AM'^x'^ 

where M is the mass of an isoscalar nucleus. This way one separates kinematical target mass 
corrections from genuine dynamical target mass corrections. 

In summary, in this part of the thesis we have described an alternative parton evolution formalism 
which combines the advantages of both x-space and N-space evolution codes. This formalism has 
been described for the nonsinglet sector, and its extension to the singlet sector will be discussed in 
Ref. jl80| . In the next Section we will use this evolution formalism to construct a neural network 
parametrization of the nonsinglet parton distribution. 
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xUy{x,Q'^) (LH) 



y^) (FT) Rel. error 



Leading order 



10-^ 


5.7722 


IQ- 


5 


5.7722 


10- 


-5 


3.3760 


10- 


-6 


10"^ 


3.3373 


IQ- 


4 


3.3373 


10- 


-4 


1.6880 


10- 


-6 


10-5 


1.8724 


IQ- 


3 


1.8724 


10- 


-3 


1.9212 


10- 


-6 


10-4 


1.0057 


10- 


2 


1.0057 


10- 


-2 


1.4095 


10- 


-6 


10-3 


5.0392 


10- 


2 


5.0392 


10- 


-2 


2.6145 


10- 


-6 


10-2 


2.1955 


10- 


1 


2.1955 


10- 


-1 


3.1065 


10- 


-6 


0.1 


5.7267 


10- 


1 


5.7267 


10- 


-1 


6.4524 


10- 


6 


0.3 


3.7925 


10- 


1 


3.7925 


10- 


-1 


9.2674 


10- 


-6 


0.5 


1.3476 


10- 


1 


1.3476 


10- 


-1 


1.1307 


10- 


-5 


0.7 


2.3123 


10- 


2 


2.3122 


10- 


-2 


2.1165 


10- 


-5 


0.9 


4.3443 


10- 


4 


4.3440 


10- 


-4 


6.3630 


10- 


-5 



Next-to-Leading order 



10-' 


1 


0616 


10- 


4 


1 


0616 


10- 


-4 


2.1462 


10- 


6 


10-^ 


5 


4177 


10- 


4 


5 


4177 


10- 


-4 


8.7799 


10- 


6 


10-5 


2 


6870 


10- 


3 


2 


6870 


10- 


-3 


9.7796 


10- 


6 


10-4 


1 


2841 


10- 


2 


1 


2841 


10- 


-2 


1.3380 


10- 


5 


10-3 


5 


7926 


10- 


2 


5 


7926 


10- 


-2 


8.5063 


10- 


6 


10-2 


2 


3026 


10- 


1 


2 


3026 


10- 


-1 


3.0757 


10- 


7 


0.1 


5 


5452 


10- 


1 


5 


5452 


10- 


-1 


7.6419 


10- 


7 


0.3 


3 


5393 


10- 


1 


3 


5393 


10- 


-1 


2.6979 


10- 


6 


0.5 


1 


2271 


10- 


1 


1 


2271 


10- 


-1 


2.4466 


10- 


5 


0.7 


2 


0429 


10- 


2 


2 


0429 


10- 


-2 


1.4810 


10- 


5 


0.9 


3 


6096 


10- 


4 


3 


6094 


10- 


-4 


6.0762 


10- 


5 



Next-to-Next-to-Leading order 



10-^ 


1 


5287 


10- 


4 


1 


5287 


10- 


-4 


1.5497 


10- 


-5 


10-^ 


6 


9176 


10- 


4 


6 


9176 


10- 


-4 


5.0711 


10- 


-6 


10-5 


3 


0981 


10- 


3 


3 


0981 


10- 


-3 


9.5455 


10- 


-6 


10-4 


1 


3722 


10- 


2 


1 


3722 


10- 


-2 


1.8022 


10- 


-5 


10-3 


5 


9160 


10- 


2 


5 


9160 


10- 


-2 


5.0631 


10- 


-6 


10-2 


2 


3078 


10- 


1 


2 


3078 


10- 


-1 


2.4853 


10- 


-6 


0.1 


5 


5177 


10- 


1 


5 


5177 


10- 


-1 


2.4747 


10- 


-6 


0.3 


3 


5071 


10- 


1 


3 


5071 


10- 


-1 


2.8430 


10- 


-7 


0.5 


1 


2117 


10- 


1 


1 


2117 


10- 


-1 


3.5893 


10- 


-5 


0.7 


2 


0077 


10- 


2 


2 


0077 


10- 


-2 


5.5823 


10- 


6 


0.9 


3 


5111 


10- 


4 


3 


5109 


10- 


-4 


5.8172 


10- 


-5 



Table 6.15: 

Benchmark tables for the evolution formalism described in this Section in the Fixed Flavor Number Scheme 
with Nf = 4 active flavors. The procedure for the benchmarking is described in detail in Section of 1.3 of 
Ref. |188) and in Section 4.4 of We use the same notation as in the above references. 
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6.4.2 The non-singlet parton distribution 

Once the formalism that will be used for the evolution of the nonsinglet parton distribution has been 
introduced in the previous Section, we now turn to the first application of the general technique 
described in Chapter [S] to the parametrization of parton distributions functions. We will consider 
the parametrization of the nonsinglet parton distribution qpfs{x, Qq) from experimental data on the 
nonsinglet structure function, F^^{x,Q'^), defined in Eq. 15.911 The restriction to the nonsinglet 
sector makes the problem technically simpler, since nonsinglet evolution requires a single parton 
distribution parametrized with a neural network. Hence, nonsinglet evolution does not involve the 
complications from the training of several neural networks. In the general case of singlet evolution, 
one requires the simultaneous minimization of additional neural networks which parametrize different 
independent combinations of quark parton distributions as well as the gluon distribution, and this 
issue will be considered in Ref. |18()j . In Fig. I6.26l we show a diagram which summarizes our approach 
to parametrize the nonsinglet parton distribution qpfsix,QQ). Note that the main difference with 
respect Rcf. 11 is that the effects of DGLAP parton evolution reduce the dimensionality of the 
quantity to be parametrized from d = 2 for F^^{x, Q^) to d = 1 for the parton distribution qi\fs{x). 




Figure 6.26: 

General strategy for the parametrization of the nonsinglet parton distribution qNs{x,Qo) from 
experimental data on the nonsinglet structure function F2^{x,Q'^). 



For the parametrization of the nonsinglet parton distribution q^six, Qq), the same data on the 
nonsinglet structure function F^^^XjQ"^) that was used in Ref. ^I] from the NMC ji:-{8| and the 
BCDMS |139l I14U| collaborations will be used in the present analysis. The main features of this 
experimental data from was discussed in Ref. 11.. While BCDMS covers the large-x, large-Q^ 
kinematical region, NMC covers the complementary small- a;, small region, with some overlap in 
the intermediate regions. BCDMS data is rather precise, while the small- a; NMC data has larger 
uncertainties. The kinematical coverage of the F^^ experimental data available from these two 
experiments can be seen in Fig. 16.271 The requirement of a simultanous measurement of the proton 
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Figure 6.27: 

Kinematical coverage of the available experimental data on the F^^ {x, Q^) structure function in the 
(a;,Q^) plane. Note that fixed target scattering geometry implies that large-Q^ data is at large-i and 

conversely. 



F2 and deuteron F2 structure functions for the determination of the nonsinglet combination, Eq. 
16.91)1 restricts sizeably this kinematical coverage as compared with the one for only, see Fig. 16.51 
However, there are proposals |196l I197| for future deep-inelastic scattering facilities which should 
extend this kinematical range in the small-x region and in addition reduce the associated statistical 
and systematic uncertainties. The inclusion of additional high statistics data from JLAB |198[I199] 
at the largest values of x would require the use of resummed parton evolution. 

The main difference with respect the dataset used in Ref. TT' is that now we have applied 
to this dataset the kinematical cuts > 3 GeV^ and > 6.25 GeV^. The motivation for 
the two kinematical cuts is to remove those data points for which the application of perturbation 
theory is questionable, as has been discussed in Section l4.1.2l The cut in is required to remove 
experimental data for which higher twist corrections might be sizable, and the cut in removes 
those data points at very large x for which the application of unresummed perturbation theory is 
not reliable. After these kinematical cuts, we are left with a total of A^dat = 483 data points. 

In Table I^^TCI the features of the experimental data that is included in the present analysis are 
summarized. Note that the average error is substantially larger for the NMC experiment than for 
the more precise BCDMS measurements. All the experiments included in our analysis provide full 
correlated systematics, as well as normalization errors. The description of the different uncertainties 
of the experimental data that will be used can be found in Ref. JI] . 



Experiment Ref. 


X range range (GeV'^) 


A^dat 


(o-tot) 


(p) (cov) 


NMC ^138^ 


9.0 10-^ - 4.7 10-1 3.2-61 


229 


102.0 


0.034 0.200 


BCDMS 


7.0 10-^ - 7.5 10-1 8.7-230 


254 


24.6 


0.163 0.007 



Table 6. 16: Experiments included in this analysis. Note that the values of a and cov are given as percentages. 
The data from the two experiments partially overlaps in the medium- a;, medium-Q^ region. 



The first step of our strategy, as discussed in Chapter |31 is to construct a Monte Carlo sample 
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N 

* rep 


10 100 1000 


{PE 


20% 6.4% 1.3% 
0.97 0.997 0.999 


r J/dat 


6.1 10-5 1.9 10-5 6.7 10-^ 
33% 11% 3% 
0.011 0.011 0.011 
0.94 0.994 0.999 




0.10 9.4 10-3 1.0 10-3 
0.182 0.097 0.100 
0.47 0.79 0.97 




5.5 10-9 1.7 10-1° 5.7 10"" 
1.3 10-5 7.6 10-6 8.1 10-6 
0.41 0.81 0.975 



Table 6.17: Comparison between experimental and Monte Carlo data. 

The experimental data have {(^^'""^^) = 0.011, (p^"'"-'^) = 0.107 and (cov('=''p))^^^ = 8.63 IQ-^ 



of replicas of the experimental data. Note that as in the case of the parametrization of the lepton 
energy spectra discussed in Section 16.31 the quantity for which the sample of replicas is generated, 
{x,Q^), does not coincide with the quantity that is parametrized with neural networks, the 
parton distribution qpfs{x, Qq). Since in this case the experimental data is the same as in Ref. |11| . 
we use the same relations to generate the Monte Carlo sample, namely Eq. 15.31 which for the present 
situation reads, 

(6.102) 

In Table I07I the statistical estimators from the sample of generated replicas are presented, for the 
data set that is used in the fit, where the statistical estimators have been defined in Section 
This table shows that a sample of 1000 replicas is sufficient to ensure average scatter correlations of 
99% and accuracies of a few percent on structure functions, errors and correlations. 

Once the Monte Carlo sample of replicas of the structure function F^^{x, Q^) has been generated, 
the second step is to train a neural network on each replica of the experimental data. However the 
situation is now more complicated than in the structure function case |11| . Now, while the Monte 
Carlo sample is constructed from the experimental data on the nonsinglet structure function, the 
neural networks parametrize the nonsinglet parton distribution. Therefore, following Eqns. 16.861 
and 16.911 the k-th neural network which parametrizes a parton distribution at the starting evolution 
scale Qq is related to the k-th replica of the experimental data on the nonsinglet structure function 
at {x, Q"^) as 

^ArS(nct)(.) ^2) ^J'^f (q2) (q2)) ^(nct)(fc) /^^ ^ fc = 1 , . . . , iV^^ , 

J X y \y / 

(6.103) 

where the evolution factor F which includes the effect of the perturbative coefficient function 
Cjvs (x, as (Q^)) has been defined in Eq. 16.931 

In the present analysis the value of the strong coupling as {Q^) will be kept fixed at the current 



N. 



sys 



(fc) 
i=i 



dat 



The neural network approach to parton distributions 



130 



world average |162j given by 

as(M|) = 0.118 ± 0.002 , (6.104) 

and our fits will be repeated for the extreme values of the strong coupling allowed by the errors in 
Eq. 16.1041 to estimate the theoretical uncertainties associated to qNs{x, Qq) from the finite precision 
with which the strong coupling Ug (Q^) is determined from data. The determination of Og (A/f ) 
altogether with the nonsinglet parton distribution from experimental data is possible in principle, 
but it has been decided to have as (A/f ) as an external input since first the opposite complicates 
considerably the practical implementation of the evolution formalism discussed in Section f6.4.1l and 
second, the strong coupling in much better determined from other processes involving the strong 
interaction |20| . The interplay between the strong coupling and global parton distributions fits has 
been recently discussed in Ref. |2UU| . 

As been been discussed in detail in Section 15. 2. 31 there exist several techniques to implement the 
training of neural networks. In the present case the optimal training strategy has been found to be 
a single training epoch, in which the covariance matrix error, E^'^'^ , Eq. 15.341 is minimized for each 
replica. In the present case one has 

(6.105) 

with the covariance matrix defined in Eq. 15.351 Note that the error function is normalized to the 
number of degrees of freedom |10()| . The minimization algorithm used is genetic algorithms with 
dynamical stopping of the training. Weighted training will also be used in order to guarantee that 
at the end of the training the total x^, Eq. 15.761 which now reads 

, ,2 _ 1 V'V / pNSjnct) \ p^-5(cxp)\ / n //pAfS(not)\ p^^S(oxp)^ 

^ - A^dat - A^pa. [V^ Aep"^^ ) ^'^^ [{F. " j' 

(6.106) 

as computed for the two experiments has a similar value. 

The architecture of the neural network that parametrizes gNsi^^Qo) must be determined as 
discussed in Section f5.2.3l This optimal architecture is obtained by the requirements that it must be 
complex enough to reproduce the experimental data patterns and that the results are independent 
of the precise number of neurons. In particular it has been checked that the results are stable for 
architectures with one more or one less neuron than the reference architecture, 2-5-3-1. This ensures 
that the network architecture is redundant for the problem under consideration. 

To define the optimal training strategy we should determine which is the suitable value of the 
Xstop parameter that defines the dynamical stopping of the training, as discussed in Section 15.2.31 
We will use the overlearning criterion, introduced in the same Section, to determine its value. The 
overlearning criterion to determine the length of the training states that the training should be 
stopped when the neural network begins to overlearn, that is, it begins to follow the statistical fluc- 
tuations of the experimental data rather than the underlying physical law. The onset of overlearning 
can be determined by separating the data set into two disjoint sets, called the training set and the 
validation set. One then minimizes the error function, Eq. 15.341 computed only with the data points 
of the training set, and analyzes the dependence of the error function Eq. 15.341 of the validation set 
as a function of the number of generations. 

Then one computes the total x^i for both the training and validation subsets. It turns out that 
in the present case fluctuations in the data set turn out to be very large, and one has to average over 
a large enough number of partitions to obtain stable results. The onset of overlearning is determined 
as the length of the training such that the of the validation set saturates or even rises while the 

of the training set is still decreasing. 



The neural network approach to parton distributions 



131 



In Fig lt).28| we show the for both the training and vahdation subsets as a function of genetic 
algorithms generations, averaged over a large enough number of partitions. The training partition 
contains the 30% of all the data points, selected at random, while the validation partition includes 
the complementary 70% data points. If A^part is the number of different partitions used to determine 
the onset of overlearning, in Fig. 16. 281 we show both the average value of the total {x^)pi^^^ sjid its 
variance a^2 , defined as 

(x')pa.t = ]^ E X? , (6.107) 

]^ E (X?)^ - ((X\ J - (6.108) 

where xf is the value of the error function for the 1-th partition. The number of required partitions 
A^part has to be large enough so that the resulting distribution is gaussian and therefore the standard 
deviation, Eq. 16.1081 has the standard statistical interpretation. In Fig. 16. 291 we show the histograms 
for the distributions of Xti i s-nd Xvai i over partitions. One observes that in the present case A^part = 
20 is enough to achieve convergence. In Fig. 16. 301 we show the relation between the total and the 
Xstop used in the dynamical stopping of the training. This relation is needed in the determination 
of the appropiate value of Xstop i^ the dynamical stopping of the training to achieve a given value 
of the total x^- The overlearning anaysis points out to the fact that the final total x^ should be 
around x^ ~ 0-8, which from Fig. 16.301 implies a value Xstop ^ 2.0 for the dynamical stopping of the 
training. 




rt c I ■ I ■ ■ ■ ■ I ■ i_i I I ■ ■ I ■ ■ ■ ■ I i_i I i_i ■ ■ ■ ■ 

50 10O 150 200 250 30O 350 400 450 5C0 



Figure 6.28: 

The average (x^) ^ over partitions as a function of the number of genetic algorithms generations. The 
bands show also the associated variance a^2. The up-right plot shows the ratio Xvai/Xtr- 



With the stability estimators defined in Section r5.3.2l one can assess which is the required number 
of trained replicas A^rop to obtain stability of the results. To this purpose, one compares in Table 



The neural network approach to parton distributions 



132 



Distribution of ] 

12| 



Training 

Entries 20 
Mean 0.757 
Error Q.060 



0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 



Distribution of 1 

12| 



10 ValidaliQrr~ 
Entries 





0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 



Figure 6.29: 

Distribution of xf for the different partitions used in the overlearning test, for both the training (left) and 

the vaUdation (right) subset of points. 



16. 181 the results for the probability measure of qNsiXjQo) for different values of A^rep- That is, one 
compares the probability measure as obtained from training a sample of neural networks on iVrop 
Monte Carlo replicas of the experimental data with another probability measure, constructed from 
a different set of A^rcp Monte Carlo replicas. One expects that the differences in this comparison are 
reduced as the value of N^cp is increased. For these stability comparisons, we use in the data regions 
■^dat = 14 points linearly spaced between x = 0.04 and x — 0.75, and the same number of points in 
the extrapolation region logarithmically spaced between x = 0.001 and x — 0.01. It can be observed 
that the required stability is obtained for A^icp = 500, as expected. 



iV„p 


10 


100 


500 




1.26 


1.12 


1.10 




1.29 


1.15 


1.16 


(^^^KDdat 


1.48 


1.15 


1.14 




1.48 


1.32 


1.21 



Table 6.18: Stability estimators, defined in Section 15.3.21 for the probability measure of gjvs(a;,Qo) SiS a 
function of the number of trained replicas A^rcp, both in the data and in the extrapolation region. 



As discussed in detail in Ref. j20H . the most unbiased fitting strategy is to parametrize with a 
neural network the quantity xq^^si^, Qo)- C)n top of that, the nonsinglet parton distribution has to 
satisy the kinematical constraint that 

QNsix = 1, Qg) = , for all QI , (6.109) 

that is, parton distributions vanish in the elastic limit. This constraint will be implemented with 
one of the techniques discussed in Section 15.2.41 the hard-wiring of the kinematical constraint in 
the neural network parametrization. With this two considerations, we write the non-singlet parton 
distribution in Eq. 16. 1031 as 

q^^f^'H^^QD-il-^)^-^^^^^^ , (6.110) 

X 

where now it is the quantity q"^'^'^'"''\x, Qq) the one that is parametrized with a neural network. It 
can be checked that the neural network complemented with some simple functional form dependence 
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Figure 6.30: 

The total x^, as computed in Eq. 15.761 from averages over replicas, as a function of the Xstop used in the 

dynamical stopping of the training. 



is still an unbiased approximant to the true value of the nonsinglet parton distribution. In particular 
we will show that the results of the fit are not affected if in the expression 



~<nct)(/c) 
'iNS 



(6.111) 



the values of the parameters m and n are modified from their default values, m — 1 and n — 1. 
The stability of the results with respect to reasonable variations of the m, n exponents can be made 
quantitative by means of the stability statistical estimators introduced in Section 15.3.21 Fig. 16.311 
show in two particular cases how the results of the fit are stable againts different choices of the 
polynomial exponents m and n in Eq. 16.1111 This comparison is made more quantitative with the 
stability estimators, as can be seen in Table I^T^ 





Small- a; exponent n 


Large- a; exponent m 


{RE M)dat 


0.48 


1.92 




0.39 


0.38 


{RE [a,])^^^ 


0.81 


1.80 




0.92 


2.27 



Table 6.19: The stability estimators defined in Section 15.3.21 for the comparison of fits with different 
polynomial exponents, see Fig. 16.311 The method used to compute these estimators is the same that has 
been used to assess the stability with respect A^'rcp. 



Now we discuss how the results for the parametrization of qNs{x, Qq) depend on the kinematical 
cut in Q^. The kinematical cut in has been chosen rather low {Q^ > 3 GeV^) since first 
of all Target Mass Corrections taken into account in the theoretical expression for the nonsinglet 
structure function F^^ , Eq. 16.991 and second we have not observed evidence, within experimental 
uncertainties, of a dynamical higher twist correction. High quality data at large- a; would allow 
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to extract the higher twist contribution HT(a;) to the nonsinglet structure function with higher 
accuracy with a variety of techniques |2(J21 1203) . even if it is known that the magnitude of the 
extracted HT correction is reduced if evolution is performed at the NNLO level, as it is in our case. 
The kinematical cut in the invariant mass of the final hadronic state W'^ Q'^il - x) > 6.25 GeV^ 
removes those points at the largest values of x for which a Sudakov resummed evolution |32[ I204| 
would be needed. 
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Figure 6.32: 

Comparison of the results for the nonsinglet parton distribution xqNs{x,Qo) for two different kinematical 
cuts: the one for the reference fit > 3 GeV^ and another with a more conservative one > 9 GeV^. 



In Fig. 16.321 we compare the results of the reference final fit, with the standard kinematical cut 
in > 3 GeV^ with another fit with exactly the same training parameters but with kinematics 
restricted to > 9 GeV^. It is clear that the uncertainties in the parametrization of qNs{x,Qo) 
grow sizeably at medium and small x once the subset of data points with 3 < < 9 GeV^ is 
removed from the fit. Note that the size this subset of points is rather large, ~ 150 data points, the 
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bulk of the NMC experimental data. 



Large x Small x 




Figure 6.33: 

The neural network parametrization of the nonsinglet parton distribution, compared to the CTEQ and 
MRST results, both at large— j: (left) and at small-x (right). Note that uncertainties grow very fast in the 
small- a:: region, since there is no experimental data for x < lO"'^. 




Figure 6.34: 

Comparison of the nonsinglet structure function Fi^^ {x,Q^) as computed with the parametrization of 
qNs{x, Qo) discussed here with the structure function parametrization of Ref. for = 5 GeV^ (left) 

and Q2 = 50 GeV^ (right). 



Once the set of A^rcp neural networks have been trained with the strategy described above, the 
third step of our approach is to validate these results by means of statistical estimators, defined in 
Section 15.31 Table 16.201 summarizes the most important estimators for both the total number of 
data points and the individual experiments. Note that the same patterns as in Ref. 11 is observed: 
errors are reduced, which is sign that the neural network has found the underlying law from the 
experimental data, while correlations are increased, in a way that the covariances as computed for 
the probability measure for qNsi^iQo) approximately reproduce the corresponding experimental 
quantities. Note also that the total for the two experiments included in the fit, NMC and 
BCDMS, is very similar. This result is only achieved after a detailed study of the optimal weighted 
training strategy, as discussed in Section 15.2.31 

Now we discuss our final results for the parametrization of the nonsinglet parton distribution 
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Tr>f cil 1\TA/TP RPT^A/TQ 
lOLcLl IN iS/lVy JjV^UiViO 


{E) 

J- |"^(art)j 


0.77 0.78 0.75 
2.07 1.94 2.18 
0.80 0.78 0.95 




0.011 0.017 0.006 
0.004 0.005 0.003 
0.51 0.-0.04 0.91 


(P"f)Hat 


0.18 0.04 0.16 
0.47 0.43 0.50 
0.31 0.19 0.39 


('=0-^";^)Hat 

r [cov(^''*)] 


8.6 10-^ 1.0 10-5 7.2 10-6 
9.5 10"*^ 1.4 10-5 5.5 10-6 
0.25 0.22 0.76 



Table 6.20: Statistical estimators for the validation of the results. 



<lNs{x,Q^), and particular attention is paid to the comparison with the same parton distribution 
as obtained with the standard approach described in Section In Fig- 16.331 the results of our 
parametrization of q^^s^x, Qq) are presented, both at large x and at small x, where we also compare 
with the results of the CTEQ (the CTEQ61 analysis of Ref. Tl!) and MRST (the MRST2001E of 
Ref. |205j ) collaborations. The most striking feature of our results is that the size of the error band 
at small x is much larger than in the fits with the standard approach. Note that even if the global 
fits of Refs. O EOSI include much more data than the present analysis, the low x behavior of the 
nonsinglet parton distribution is only constrained by the data on the nonsinglet structure function 
F^^ that is used in this analysis. 

We can also analyze the effects that including the information of QCD parton evolution has 
on the parametrization of experimental data. These effects can be seen in Figs. 16.341 where the 
results of the present analysis are compared with the parametrization of the nonsinglet structure 
function F^^ {x,Q'^) from Ref. JT]. One observes that the two results are consistent within the 
respective uncertainties. Note that due to the effects of QCD evolution, experimental measurements 
of the nonsinglet structure function F.^^{x,Q'^) for different values of correspond to the same 
measurement of gNsi^jQa) repeated many times. 

In summary, in this part of the thesis we have described the first application of the technique 
described in Chapter to the parametrization of parton distribution functions. The main result 
appears to be that uncertainties in parton distributions obtained with the standard approach are 
underestimated, specially in the extrapolation region. The results of this part of the thesis will be 
continued in Ref. |180| . were the full singlet evolution will be considered. 



Chapter 7 

Conclusions and outlook 



In this thesis we have described in detail a general technique to parametrize experimental data in an 
unbiased way with faithful estimation of the associated experimental uncertainties. This technique 
was first introduced in Ref. [Tl] and during the course of this thesis it has been improved and 
extended to the analysis of other processes of interest. In particular we have shown the first appli- 
cation to the problem that motivated its development, the parametrization of parton distribution 
functions. 

The general strategy introduced in Ref. TT has been extended in several ways. First of all, the 
technique has been applied to different situations than the original application, the parametrization 
of deep-inelastic structure functions from a moderately large data set. In these new applications, 
for example, we have faced problems with a very large amount of experimental data from different 
experiments and problems in which the parametrized quantity is related to the experimental data 
in through complicated relations, like convolutions. Second, new minimization algorithms for neural 
network training have been introduced, in particular genetic algorithms which are suited for the 
minimization of highly nonlinear error functions. We have also introduced more refined criteria to 
assess the optimal point to stop the training of the neural networks. Additional statistical estimators 
to assess several characteristics of the probability measures have also been introduced. Finally, 
the application of the strategy discussed in this thesis to several different processes, deep-inelastic 
structure functions, hadronic tau decays, semileptonic B meson decays and parton distribution 
functions, increases our confidence on its general validity. 

The number of possible applications of the general strategy to parametrize experimental data 
described in this thesis is rather large. The most important one is to generalize the results of Section 
16.41 to the singlet sector 180' and to produce a full set of neural network parton distributions with 
a faithful estimation of their uncertainties. Another promising application is the parametrization of 
the nonperturbative shape function from semileptonic and radiative B meson decays. Astroparticle 
physics is another field in which several applications of the neural network approach have been 
envisaged. In particular, there is an ongoing project ppfii in which the general strategy discussed 
in this thesis is used to determine the atmosferic neutrino flux from experimental data on neutrino 
event rates. 
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Chapter 8 

Conclusiones 



En la presents tesis doctoral hemos descrito en detalle una novedosa tecnica general para construir 
la densidad de probabilidad de una funcion a partir de medidas experimentales, esto es, una tecnica 
para parametrizar datos experimentales, sin necesidad de hacer hipotesis alguna sobre la forma 
funcional de la funcion a parametrizar y con una estimacion fidedigna de las incertidumbres asoci- 
adas, que permite una propagacion de los errores a observables arbitrarios sin necesidad de asumir 
aproximaciones lineales. Esta tecnica fuc introducida en ^Jj y durante el transcurso de esta tesis 
doctoral ha sido mejorada en diferentes aspectos y extendida mediante su aplicacion a otros procesos 
de interes. En particular hemos mostrado su aplicacion al proceso que motivo originariamente el 
desarrollo de esta tecnica, la parametrizacion de distribuciones de partones en el proton. 

La tecnica general ha sido extendida en diversas direcciones. Primero de todo, hemos demostrado 
la generalidad de esta tecnica mediante su aplicacion a cuatro procesos de interes, cada uno de ellos 
diferente a la aplicacion original descrita en 11 . Por ejemplo, hemos tratado problemas con un gran 
niimero de datos provenientes de diferentes experimentos, asi como problemas en los que la relacion 
entre los datos experimentales y la funcion que estamos parametrizando viene dada por una serie 
de convoluciones. En segundo lugar, nuevos algoritmos para el entrenamiento de redes neuronales 
han sido introducidos, en particular los conocidos como algoritmos geneticos, que son necesarios 
para la minimizacion de funciones de error altamente no lineales. Finalmente, hemos ampliado el 
conjunto de tecnicas estadisticas usadas en la validacion de los resultados, esto es, en la comprovacion 
quantitativa de como la densidad de probabilidad construida reproduce las caracteristicas de los datos 
experimentales. 

Las posibles aplicaciones de la estrategia general para parametrizar datos experimentales descrita 
en la presente tesis doctoral son ciertamente numerosas. La mas imporante de estas es la general- 
izacion de los resultados decritos en la Seccion l^^ al sector singlet de las distribuciones de partones, 
para obtener de esta manera un conjunto completo de distribuciones de partones parametrizadas con 
redes neuronales con estimacion fidedigna de las incertidumbres asociadas. Otra aplicacion promete- 
dora es la parametrizacion de la shape function, una funcion que contiene los efectos no perturbativos 
dominantes en un cierto tipo de desitegraciones del meson B. Finalmente, otra aplicacion interesante 
consistiria en la parametrizacion del flujo de neutrinos atmosfericos a partir de datos experimentales 
de detecciones de neutrinos en experimentos como Super Kamiokande. 
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Appendix A 

Elements of statistical data 
analysis 

A.l Review of probability theory 

In this Appendix we review some basic elements of probability theory |1(JUI I162| . Let us consider 
with full generality a set of A^dat observables Fi, i — 1, . . . , Adat- One has, for each observable, A^rop 

Ik) 

independent measurements, which will be denoted by i^j ik — 1, . . . , A^rep- As it is clear from the 
notation, we have in mind the general strategy to construct a probability measure in the space of 
the observable F described in Chapter [S] where A^rcp stands for the number of generated replicas of 
the experimental data. From this set of measurements, one can construct the following statistical 
estimators for each observable F^: 



• Mean: 



(A.l) 



Variance of the data: 

cT^^{{F,-{F,)f)={Fi)-{F^,f , (A.2) 

Correlation between data points: 

_ m - im {F, - {F,))) _ {F,Fj) - (f,) {F,) 

Pij — — ■ l^-'J; 

(TjCTj aiaj 

Unless otherwise indicated, all the averages are performed with respect to the A'^op measurements, 
and we assume that we are in the limit where A'lop is very large. The above estimators describe 
features of the underlying probability density of the experimental data, and they approach the true 
values as the number of measurements A'rcp becomes very large. 

This result is quantitatively described by the variances of the different estimators. These esti- 
mator measure the difference of the values of the mean, variance of the data and correlations as 
determined with averages over measurements with respect to their true values. These variances, 
written in terms of moments of the Fi, are given by 



1. Variance of the mean: 
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2. Variance of the error 
1 



3. Variance of the correlation 



{{F. - {F) f) at] = ^ {(F^) - 4 (F,) (F,') + 6 (Ff ) {F.f - 3 (F.)^ - of) 

(A.5) 



1 



rep 



((F,-(F,))(F,-(F,))) 



2 



1 / 1 



(J^^F/) _ 2 (F,) (F,F2) - 2 (F,) (FfF,-) 



+4 (F,F,) (F,) (F,) + (Ff ) (F,-)^ + (F,)^ (F,^) - 3 (F,f (F,) 



which can also be written as 



■op 



(A.6) 



(A.7) 



It is clear from the above expressions that the variances of the mean, the error and the correlations 
decrease when the number of measurements is increased. This implies that as statistics are increased, 
the measured values of the mean, error and correlations are closer to their true values. Therefore, 
to compare different probability measures, the variances of the mean, the error and the correlation 
as defined above should be used. 



A. 2 The Monte Carlo approach to error estimation 

In this Section we show with a simple example that the Monte Carlo approach to error estimation 
described in Section |5l is equivalent to the standard approach, based on the condition A^^ — 1 
for the determination of confidence levels, with the assumption of gaussian errors, up to linearized 
approximations. Then we present an example of the application of the Monte Carlo approach to 
error estimation in the case of standard functional form fits. 

Let us consider two pairs of independent measurements of the same quantity, x\ ± (j\ and X2 ± 02 
with gaussian uncertainties. The distribution of true values of the variable a; is a gaussian distribution 
centered at 

-x^^-^^i^. (A.8) 

and with variance determined by the Ax^ = 1 tolerance criterion, 

^2 ^ _alal_ 

To obtain the proof of the above results, note that if errors are gaussianly distributed, the maximum 
likelihood condition imply that the mean x minimizes the function 

X = — -2 — + — —2 — . (A.IO) 

"\ "2 

and the variance tr is determined by the condition 



Ax' {x^g)-x^ {x) 



(A.ll) 
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which for A^^ — 1 leads to Eq. IA.9I Note that these properties only hold for gaussian measurements. 

An alternative way to compute the mean and the variance of the combined measurements xi 
and X2 is the Monte Carlo method: generate N^ep replicas of the pair of values xi,X2 gaussianly 
distributed with the appropriate error, 

x[^'^ ^xi+r^^^au l,...,7V,ep , (A.12) 

=a;2+r^''V2, k = 1, . . . , N,,p , (A.13) 

where r^''^ are univariate gaussian random numbers. One can then show that for each pair, the 
weighted average 



-(fc) ^ "^^ ' ^ (A. 14) 



an + x\ a. 



al + al 



is gaussianly distributed with central value and width equal to the one determined in the previous 
case. That is, it can be show that for a large enough value of A^rep, 



-— (A.15) 

^^"P k=l 



and for the variance 



which is the same result, Eq. IA.9I as obtained from the Ax^ — 1 criterion. This shows that the two 
procedures are equivalent in this simple case. 

The generalization to A^dat gaussian correlated measurements is straightforward. Let us consider 
for instance that the two measurements xi and X2 are not independent, but that they have correlation 
Pi2 < 1- To take correlations into account, one uses the same Eqns. IA.12I and IA.13I to generate 
the sample of replicas of the measurements, but this time the random numbers rj'^^ and r2*'' are 
univariate gaussian correlated random numbers, that is, they satisfy 

^^■■<=P k=i 

With this modification, the sample of Monte Carlo replicas of xi and X2 also reproduces the exper- 
imental correlations. This can be seen with the standard definition of the correlation. 



x[''^ - Xi^ ^Xj'"'' ~ ^2 , 



cri(T2 



= (rir2),, = P12 . (A.18) 



/ rep 

rep 



Therefore, the Monte Carlo approach also correctly takes into account the effects of correlations 
between measurements. 

In realistic cases, the two procedures are equivalent only up to linearizations of the underlying 
law which describes the experimental data. We take the Monte Carlo procedure to be more faithful 
in that it does not involve linearizing the underlying law in terms of the parameters. Note that as 
emphasized before, the error estimation technique that is described in this thesis does not depend 
on whether one uses neural networks or polynomials as interpolants. Conversely, one could derive 
l-cr errors on the parameters of the neural network as an alternative to estimate the uncertainties 
in the parametrized function. 

As an example of the application of the Monte Carlo error estimation to standard fits of parton 
distributions with polynomial functional forms, we repeat the nonsinglet fit of Refs. |2U71 120!^ . 
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We use exactly the same techniques as discussed in Section 16.4.21 but with a functional form to 
parametrize qNsi^i Qq) instead of a neural network, which is taken to have the functional dependence 
PIT7| 



qNsix,Ql) = - {uy 



2{d~ u)mrst) {x, QI) 



were we have defined 



{d-u) 



MRST 



u,(x,Ql) = A„^x'^"(l-x)^" ( 

dv {x, Qq) 

{x,Ql) = 1.195a;' 



1 - 1.108x5 



26.283X 

Ad^x^-'il ~ x)''" (^1 + 0.895X* + 18.179X 
"■^''(1 - x)^-^° (1 + 14.05X - 45.52X 



(A.19) 

(A.20) 

(A.21) 
(A.22) 



and the (d — u) combination is taken from the MRST global analysis 'TS'. The normalization 
constants are fixed by the conservation of the number of valence quarks 



/ dxuy{x) 
Jo 



2 , 



dxdy{x) = 1 



(A.23) 



The values of the parameters obtained from a fit to the experimental data are summarized in Table 
lA.ll were we compare with the results of the original fit |207| . In particular one observes that the 
exponent which governs the small-x behavior of the nonsinglet parton distribution, Qu, is correctly 
reproduced as expected, since at small- a; the experimental data that determines the behavior of 
QNsix, Qq) is the same in the two cases. 





a-u 


bu 


ad 


bd 


Refs. 207, 208 


-0.686 


4.199 


-0.587 


6.190 


NNPDF 


-0.705 


0.844 


0.384 


1.035 



Table A.l: 

The results of a fit to the nonsinglet structure function F^^ {x, Q^) for a parton distribution with functional 
dependence given by Eq. Ulol compared with the results of the fit of PTflHTS) . 

Note that Refs. [ 2071 120ij) have a different parametrization above and below x = 0.3, while we 
take only the one they use for x < 0.3 and that are the large x behavior they use also HERA data. 
One observes in Fig. lA.ll that the small- a; behavior of our polynomial fit coincides precisely with the 
small- a; behavior of the nonsinglet parton distributions from the MRST and CTEQ global analysis. 
Note also that at medium and small— a; the uncertainties as determined with the standard methods 
introduced in Section IT!^ appear to be underestimated. 

A. 3 Correct treatment of normalization uncertainties 

In Section l5T^ we have defined different estimators that are minimized during the neural network 
training. To understand the relations between the different estimators we have used, we can use 
a simple model. This model is simple enough to be solvable in terms of compact expressions, and 
is therefore suitable to obtain intuition on the expected behavior of the different error functions. 
This exercise will be also useful to understand the effects of an incorrect treatment of normalization 
uncertainties. In the spirit of D'Agostini's analysis j^H, we consider the simple case in which two 
results of the same physical quantity, Xi and X2 are available, and we know the statistical ct;, the 
systematic CTc and normalization af uncertainties. The discussion can be generalized to the case of 
additional measurements with different systematic uncertainties. 
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Figure A.l: 

The results of a polynomial fit with the Monte Carlo method for error estimation with the parametrization 
of Ref. |207| . compared with the corresponding fit with neural networks (left) and with the standard fits of 

the CTEQ and MRST collaborations (right). 



The best theoretical prediction in this case will therefore correspond to fitting a constant which 
we will denote by k. The diagonal error, Eq. 15.331 will be given by 



i?2 



{Xi - kf {X2 - kf 



(A.24) 



Note that to derive such expression one has to assume that the experimental measurements are 
gaussianly distributed. If normalization uncertainties are neglected, then the expression for the 
covariance matrix error, Eq. 15.341 is given by 



l + kf ^ + {a:2-kf^-2 [x, - k) {X2 -k)^ 

Ui TP °1°2 °1°2 °1°2 

and finally the full x^, including normalization uncertainties, Eq. 15. 761 is given by 

l + (-i-fc)^^ + (-2-fcf%J^-2(.,-^)(x.-^)^ 



(A.25) 



X 



(A.26) 



Note that for example as CTc ^ the correlated error function reduces to the uncorrelated one 
i?2, as expected. 

Once we have defined the different error functions, we can compute the values of k for which 
each of the different error functions has a minimum. This is achieved imposing the conditions 



dk 



E2{k) 



= 0, 



dk 



Em 



= 0, 



dk 



= . 



(A.27) 



k=k 



For the diagonal error, Eq. IA.24I one has the standard weighted average 



(A.28) 



The neural network approach to parton distributions 



146 



then for the covariance matrix error, Eq. I A. 2 51 one has the same minimum as before 

This points to the fact that in a reahstic case the result of a minimization of the diagonal error 
function, Eq. 15.331 should be rather similar so the corresponding result when the minimized error 
function is Eq. 15.341 Finally for the full with normalization uncertainties one has 

which as can be rather different from the naive estimation Eq. IA.28I if data are incompatible and 
normalization error is sizeable. This is true even if normalization effects are small if the measured 
values are inconsistent, since the effect of normalization uncertainties is proportional to cr/(xi — X2) 
are thus can be arbitrarily large. In particular one has a sizeable effect if 

2 

2 i^i-^2f > 1 , (A.31) 



so one can have much larger effects than those naively expected from the value of ct/. This shows that 
the error function with normalization errors as defined in Eq. IA.26I leads to completely unexpected 
and anti-intuitive results. 

The quantities that are relevant to compute are the values of the different error functions at the 
different possible minima kt . The first one is the value of the diagonal error when minimizing the 
same error, then one has 

E2 ik2) = . (A.32) 

Another interesting quantity is the ratio between E2 and when minimizing either E2 or E^ (since 
as long as normalization uncertainties are not included the minimum is the same for both quantities). 
This ratio is given by 

MM = 11^!^ fA33) 
"'1 "'2 

This ratio is typically of order 1, showing why both errors are comparable in fits without normal- 
ization error. In particular if xi — X2 one has 



2 



E2 (fc2) _ , _ 



,^ ^l + ac o , > 1 ■ (A.34) 
^3(^2) afa^ - ^ ' 

Finally let us consider the case in which we minimize the full with normalizations errors included 
in the experimental covariance matrix Eq. 15.761 and we compute the following ratio at the value 
ky2 for which Eq. 15.761 has a minimum 



{X1-X2) 4 ( 2 



E2 {k^2 ) 1+ a-+a- ^, 
E2{k2 



2 {xi-X2)^ 



1 - f f 



(A.35) 



Note that the above quantity depends not only on (xi — X2) but also on the absolute magnitude 
xi,X2 of the measurement. The above ratio always verifies the property 

E2 (k^2) 
E2 (K2) 
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Therefore including normalization errors in the minimized results not only in a lower value of the 
fitted parameter k, but also on a larger diagonal error E2, and the two effects arise from the same 
source: combination of inconsistent data and normalization errors. This effect can be very large 
even if normalization errors themselves are small due to the presence of inconsistent data. One can 
explicitely check that the same conclusions hold in the case that the inconsistent data comes from 
different experiments. 

To check the effects of the incorrect treatment of the normalization uncertainties in a more 
realistic fit, in Fig. IA.2I we have repeated the F2{x,Q^) parametrization described in Section IfT^ 
but with the incorrect treatment of normalization uncertainties, that is, with the minimization of 
the error function with the covariance matrix Eq. 15.61 rather than with Eq. 15.351 which does not 
include the normalization uncertainties. One observes that the results of the incorrect treatment of 
the normalization errors is that the structure function is systematically lower than the result with 
the correct treatment, and this effect is much larger than the size one would naively expect since 
normalization errors are of the order of 2 — 3%. 




X 



Figure A. 2: 

Comparison of parametrizations of the proton structure function F2{x,Q^) with the correct and incorrect 
treatment of normalization uncertainties. Note that the effect is much larger than the one naively expected 
from the relative size of normalization errors, ajv ~ 2 — 3%. 
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Appendix B 

Overview of global parton fits 



In this Appendix we summarize the present status of global fits of parton distribution functions. The 
standard approach to the determination of parton distributions from experimental data has been 
discussed in detail in Section IT!^ Wc do not attempt now to review all the huge available literature 
in the subject but rather to provide the reader with a brief description of the current status of the 
field. Much more detailed information can be obtained from the original references as well as from 
proceedings of workshops like \1H'^\ m] . 

During the 1980's and the early 1990's many sets of parton distributions were developed to try 
to describe all the available hard scattering data EHHl 1^ EH E3 ESI EH- Today the most 
commonly used sets of parton distributions are those of the CTEQ and MRST Collaborations. 
This is so because these collaborations take into account all modern data from a wide variety of 
experiments as well as the progress in perturbative QCD computations, and provide regular updates 
of their parton distributions sets. Now we review the current status of the global analysis of these 
two groups. Note that even if these two groups release new versions of their sets rather frequently, 
in general these updates are only minor changes of a basic set, like CTEQ4 or CTEQ5. Note also 
that all modern QCD analysis provide estimations of the uncertainties associated to the parton 
distribution functions. 

The MRS(T) Collaboration presented his first global parton fit in Ref. |215j . Then this global 
fit was sequentially improved from a series of works: ESI ESI EH EH E^Ol EUl UM One of 

their latest sets of parton distributions is MRST2001 |221| . which is described in some detail in the 
following. The experimental data that is used in the fit is given by: 

• Neutral current deep-inelastic structure functions from the HI and ZEUS experiments at the 
HERA e=tp collider. 

• The ZEUS measurement of the charm contribution to the DIS structure function, -^^^(a;, Q^)- 

• The fixed target DIS structure functions measurements from the CCFR, BCDMS, NMC and 
E665 experiments, as well as preliminary data from the NuTeV experiment, for different types 
of targets. 

• Inclusive jet cross sections from the DO and CDF detectors at Fermilab pp collider Tevatron. 

• The E866 measurements of the Drell-Yan process for both proton and neutron targets, as well 
as previous measurements by the E605 experiment. 

• The measurement of the W-lepton asymmetry from the CDF detector at Tevatron. 

Note that the prompt photon data, that was used for some time in parton fits, is not included any 
more due to theoretical problems as well as possible inconsistencies. 
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MRST2001 is a global NLO QCD analysis with starting evolution scale Qo = 1 GeV, that uses 
the M S renormalization scheme and the Thorne-Roberts scheme for the treatment of heavy quark 
mass effects. The kinematical cuts are given by > 2 GeV^ and > 12.5 GeV^. The parton 



distributions at the initial evolution scale are parametrized by 

xuv{x,Ql) = x{u-u) {x,QI) ^ AuX^'^il - xY" (^1 + duVx + Cux) , (B.l) 

xdvix, Ql) =x{d-d) {x, Ql) ^ Adx''" (1 - xY" (1 + ddV^ + edx) , (B.2) 

xS{x,Ql)=A,x''^il-xy''{l + d,y/^ + e,x) , (B.3) 

xS{x,Ql) = 2x{u + d + s) , (B.4) 

xg{x, Ql) = Agx''^ (1 - xy<' (1 + dg^ + egx) - FgX^^il - xY^ , (B.5) 

2u, 2d, 2s = OAS + A, 0.45* - A, 0.25 , (B.6) 

xA = x(d-u) = Aax*"^ (1 - xY"^ (l + dAX + CAX^) . (B.7) 



As well as the parameters that describe the nonperturbative shape of the parton distributions, for 
this analysis also the strong coupling as (M^) is fitted, resulting in a value consistent with the 
current world average [20]. The associated uncertainties to the above parton distributions from 
experimental errors were discussed in detail in Ref. |205| and those associated to experimental 
uncertainties, like the perturbative order, higher twist corrections or Inl/x and ln(l — x) effects, 
in Ref. 80 . In Fig. IB. II we show the parton distributions that result from the global analysis 
discussed above at the scale — lO"' GeV^. Note that at such a large energy scale, the gluon 
distribution becomes dominant at medium and small x, and the contribution from heavy quarks 
becomes sizeable. At large evolution lengths the shape of the parton distributions is essentially 
determined by perturbative evolution and becomes less dependent of the initial nonperturbative 
condition at Qq. 
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Figure B.l: 

The MRST2001 partons at the scale = 10* GeV^ 

The CTEQ Collaboration has performed global analysis of parton distributions since the early 90s 
1^1^ 1^1^ until now. One of their latest work is named CTEQ6 |H1, that is summarized in 
the following. As will be seen this analysis is very close to that of the MRST2001 analysis discussed 
above. CTEQ6 uses as experimental input: 
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• Neutral current deep-inelastic structure functions from the HI and ZEUS experiments ate the 
HERA e^p collider. 

• The fixed target DIS structure functions measurements from the CCFR, BCDMS and NMC 
experiments. 

• Inclusive jet cross sections in several rapidity bins from the DO detector at Fermilab pp collider 
Tevatron. 

• The E866 measurements of the Drell-Yan deuteron to proton ratio, and the E605 measurement 
of the Drell-Yan cross section. 

• The measurement of the W-lepton asymmetry from the CDF detector at Tevatron. 

For all these experiments, all the information on correlated systematic uncertainties is available. 
Note that even if global QCD analysis succeed in describing a wide variety of hard-scattering data, 
the precision DIS structure function measurements from HERA and fixed target experiments still 
provide the backbone of parton distribution analysis. 



crsqau 

llESfPSMi 




Figure B.2: 

A comparison of the CTEQ6M partons with the MRST2001 partons at the initial evolution scale Q — 2 

GeV. 



The nonperturbative input to the parton global analysis, as has been discussed in Section^^l are 
the parametrization of the parton distributions at a starting evolution scale, which in the CTEQ6 
set is taken to be Qq = 1.3 GeV. Let us recall (see Section [4. 1.2|l that the parton distributions at 
Q > Qo are determined by the NLO DGLAP evolution equations. The functional form used in the 
CTEQ6 analysis is 

xf{x,Qo)^Aox'^'{l-x)^'e^-'^l + e^^x)'^' , (B.8) 

with independent parameters for the flavor combinations u ^u, d — d, g, and u + d. The strange 
parton distribution is kept fixed to s = s = 0.2 (m + d) /2. Also the sea quark ratio 

^ = Aox^ni - a;)^' + (1 + ^32^) (1 - a;)^* , (B.9) 
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is parametrized. Some parameters are held fixed, for a total of 20 free parameters to model the 
nonperturbative parton distribution shape at the input scale Qq. From the CTEQ6 global analysis 
not only the best-fit set of parton distributions are determined from experimental data, also the as- 
sociated uncertainties in the parton distributions are estimated using some of the methods discussed 
in Section IT!^ see the original reference [S^ for a more detailed description of the results of this 
global analysis. 

Note that the two main groups performing global fits of parton distributions, MRST and CTEQ, 
use a very similar set of experimental data, similar assumptions on the nonperturbative shape of 
the parton distributions and quite similar methods to determine the associated errors to the parton 
distributions. This means that the spread of the MRST and CTEQ results by itself cannot be 
taken, even at a qualitative level, as a measure of the uncertainty in the determination of parton 
distributions. In Fig. IB. 21 we show the results of the CTEQ6 global QCD analysis discussed above 
compared with the MRST2001 partons. Note the good agreement between the two analysis for the 
u{x) and d{x) partons, while the difference is larger for the gluon, as was to be expected since the 
gluon distribution has rather large uncertainties. 

Finally let us review some other recent determinations of parton distributions. These global 
analysis are less frequently used in phenomenological studies than the sets from the CTEQ and 
MRST collaborations, since in none of the following cases all the available hard-scattering data is 
included. S. Alekhin has presented several QCD analysis of deep-inelastic scattering data |22tjll227j . 
with emphasis in the consistency of the statistical analysis of the data. The GRV published a series 
of parton analysis ^28, 229, 230, 231 whose main feature was that a very low starting evolution 
scale Qq for the evolution. However, since the release of their last set of parton distributions |231| 
no updates have appeared. In particular this group did not produce an estimation of the associated 
uncertainties in their parton distributions. Finally, in the last years different groups have published 
additional global fits of parton distributions, each one with different motivations. For example the 
analysis of BPZ 1232) devotes special attention to the accurate determination of the sea strange 
parton distribution, while the Fermi partons j233| . which have some overlap with our approach, also 
attempt to construct a probability measure in the space of parton distributions. 

Summarizing, global fits of parton distributions have been a field of active development in the 
recent years. Two sets of parton distributions (MRST and CTEQ) are nowadays commonly used in 
most phenomenological applications, since they include all available experimental data and provide 
regular updates of their parton sets. Note that as discussed before, in both cases the experimental 
data, the theoretical assumptions and the technique to assess the uncertainties are rather similar, 
and therefore theoretical predictions for observables using the two sets of parton distributions are 
in general in good agreement. 
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